Part 1 Proposal for JDK-8264594
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu May 20 09:00:27 UTC 2021
On 20/05/2021 02:34, leerho wrote:
> Maurizio,
> Sorry about my radio silence, I was pulled onto another urgent
> project, but now I'm back :)
> From your email above:
>
> E.g. let's keep the MemorySegment API thin, by adding only one method:
>
> /copyFrom(MemorySegment, MemoryLayout elemLayout, ByteOrder order)/
>
> Which generalizes the existing copyFrom, allowing for swap copy.
>
>
> I am a little unclear how the user would use this to copy some
> primitive array to/from a MemorySegment. Let me try:
There were two parts to my proposal :-)
1. add a copyFrom method which works from segment <-> segment but also
takes an endianness/swappiness
2. add a MemoryCopy (parallel with MemoryAccess) full of static methods
to copy things from java arrays to segments and back
You need (1) to implement (2). (1) is a primitive, (2) is a class with
user convenience static methods.
Maurizio
>
> For example: to copy a double array into a MemorySegment would require:
>
> //Given:
> int arrLen = N;
> double[] srcArr = new double[arrLen];
> ResourceScope scope = ...
> MemorySegment dstSeg = ... big enough, but we need to load at a dstOffset.
>
> //prepare the source segment:
> MemoryLayout doubleLayout = valueLayout(64, ByteOrder.nativeOrder);
> MemoryLayout seqLayout = sequenceLayout(arrLen, doubleLayout);
> MemorySegment srcSeg = MemorySegment.ofArray(srcArr);
>
> //prepare the destination segment:
> dstSegSlice = dstSeg.asSlice(dstOffset, arrLen*8);
> dstSeg.copyFrom(srcSeg, seqLayout, ByteOrder.nativeOrder);
>
> Whew! This can't be right. I must be doing something wrong. How did
> you intend for this new method to be used?
>
> Cheers,
>
> Lee.
>
>
>
>
>
>
>
> On Thu, Apr 29, 2021 at 3:23 AM Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
>
> On 29/04/2021 05:49, leerho wrote:
>> Maurizio,
>>
>> First, I would like to clear up some misunderstandings.
>>
>> 1. Re overloading with/without ByteOrder:
>>
>> My feeling is that having a "more friendly" overload for what
>> is a pretty advanced method anyway feels some sort of middle
>> ground that would double the size of the API w/o really
>> making it more usable (I don't thing users like you would be
>> bothered by an extra `nativeOrder()` suffix). At least that's
>> what I was assuming.
>>
>>
>> I was trying to make the distinction between 2 vs 4 methods,
>> assuming Object array), and 14 vs 28 methods assuming individual
>> XYZArrays. The second case is clearly a more egregious impact on
>> the API so I would definitely agree that not overloading the
>> ByteOrder makes sense. *And yes, I can live with that :)*
>>
>> The 2 vs 4 methods refers to the next item.
> Yeah - I realized later that you were considering different cost
> vs. benefit ratios in the two proposals - in one case it seems to
> cost more than the other, I agree. (I also agree, in general, that
> if we could provide a ByteOrder-less version, it would be nice to
> have).
>>
>> 2. Use of Object array:
>>
>> Accepting an Object parameter is inconsistent with e.g. the
>> MemorySegment.of(...) factories, which take sharp array
>> types. In principle those factories could be compressed down
>> into one (in fact the API started off that way, but we went
>> back for performance issues).
>>
>> That is why I posed the question wrt use of Object array:
>>
>> Concern: Performance. I would think we would be moving work
>> that could be done at compile time (e.g., what type of array
>> is it? ) to run time. Or would Hotspot figure this out ?
>>
>>
>> I wasn't sure, which is why I asked. And you indicated that
>> performance with an Object array is an issue *and I believe you* :)
>
> There is a difference between MemorySegment.of(Object) and
> MemorySegment.copyInto(Object). In the first operation we are
> "creating" a new segment; I have observed that, in general, it is
> better to have creation of new instances as a transparent process
> as possible: if different methods create different concrete Java
> instances (for instance MemorySegment::ofArray(int[]) creates a
> different object than MemorySegment::ofArray(double[])) the code
> tends to be more "transparent" to the optimizing compiler, and
> typically that works better than having a single factory which
> does a bunch of stuff/control flow and might return different
> objects (which will necessarily create some pollution). I believe
> the case of memory copy is not dissimilar from what
> MemoryAccess.getInt does: note that you just pass a `memory
> segment` - we don't have overloads for "heap segment", "mapped
> segment" and "native segment" and yet C2 is able to figure that
> out (because we turned enhanced profiling on for these methods). A
> similar trick, I'm sure could be done here, if we were willing to.
> Of course we need benchmarks to test and validate the approach.
>
>>
>> 3. Native implementations:
>>
>> You seem to assume that "native implementation" == better
>> performance.
>>
>>
>> No, I don't really. My only comment about native methods is that
>> I don't know where to find the C++ code so I can't figure out
>> what is going on under the covers. Additionally, as you pointed
>> out, the C2 hotspot compiler has additional profiling it can take
>> advantage of ... and I even know less about that! *So I'll leave
>> it to you what methods should be made native*.
>
> So, there are two routines for copying memory: System::arrayCopy
> and Unsafe::copyMemory. Performance-wise they behave the same - I
> believe one minor difference is that System.arrayCopy inserts some
> GC safe point so that, in the middle of a big copy, it can still
> pause for GC - Unsafe::copyMemory (the one we use) doesn't do
> that. They are both native routines, and are both "intrinsified"
> by the VM - which means that when they are called enough time, the
> VM replaces them with bits of machine code generated by the
> optimizing compiler. This is where information about types becomes
> crucial: if, when the intrinsification occurs, C2 can prove that
> e.g. you are copying from off-heap into a int[], then a sharper
> copy can be performed. If the types are opaque (so C2 doesn't know
> whether they are on/off heap), then performance suffers. So, as
> long as the types are kept "sharp", this is typically not an issue.
>
> There's also another important consideration: copyMemory (or
> System::arrayCopy) is, relatively speaking, a "slow" operation
> (since it can touch many addresses). In my experience it is way
> less likely to observe performance issues with this routines than
> it is with a single-shot memory access (which boils down to a
> single MOV, so every extra overhead counts big time!).
>
>
>>
>> *Comment*: In our Apache DataSketches library we have implemented
>> all of our sketching algorithms in both C++ and in Java. And we
>> have experienced a number of cases where our highly-tuned java
>> implementation handily beats a casually constructed C++
>> implementation! We have found that there are similar issues in
>> both languages. If the developer tends to over-rely on the
>> general-purpose STL librarys in C++ it will impact performance in
>> the same way that over-reliance on the Java Collections library
>> will. But a well-tuned C++ implementation will generally beat a
>> well-tuned Java implementation, but not by very much! And with
>> this Panama module, I am hoping we can make the difference even less.
>>
>> I think we can agree that the above 3 issues can be put aside and
>> don't need further discussion..
> ok
>>
>> ****
>> Back to the method design:
>>
>> I think if we use an instance method we should just drop the
>> length parameter from the method and infer the length from
>> the receiver segment.
>>
>>
>> This does present some usability concerns as I will highlight next.
>>
>> Thoughts on your 2 X 2 table:
>> I think we don't need to focus on the two cases where either the
>> segment or the array does not exist.
>
> Sure - I was showing them to demonstrate the emerging symmetry/pattern
>
>
>> These can already be handled with the current API. Now for the
>> other two cases:
>>
>> 1. From array to existing segment, you wrote:
>>
>> segment [.asSlice(...)] .copyFrom(array, index, order)
>>
>>
>> My translation:
>>
>> dstSegment.asSlice(long dstOffsetBytes, long
>> dstLengthBytes).copyFromArray(XYZ[] srcArray, int srcIndex,
>> ByteOrder order);
>>
>> * The user generally has better knowledge of the array
>> coordinates (start, length). Here the user would have to
>> translate from array srcElements to dstLengthBytes. I don't
>> consider this friendly.
>> * I don't think you intended to overload the existing
>> copyFrom(segment), thus, copyFromArray(...).
>> * The new methods /copyFromArray(XYZ[] srcArray, srcIndex,
>> order) are /not as general purpose as they would it be if it
>> included the array length argument, but leads to the problem
>> of dual specification of the length both in bytes and array
>> elements.
>>
> I think of these three I can see the point of the first bullet -
> but it feels like having offset/length on the copyFrom feels
> slightly redundant (since asSlice does kind of the same thing, and
> that's what we tell people to do for segment->segment copyFrom).
> Anyway, point taken.
>> 2. From segment to existing array, you wrote:
>>
>> segment [.asSlice(...)] .toXYZArray(array, index, order)
>>
>>
>> My translation:
>>
>> srcSegment.asSlice(long dstOffsetBytes, long
>> dstLengthBytes).copyToArray(XYZ[] dstArray, int dstIndex,
>> ByteOrder order)
>>
>> * Again, the user would have to translate from his dstElements
>> to dstLengthBytes.
>> * Again, the new methods /copyToArray(XYZ[] dstArray, dstIndex,
>> order)/ are not as general purpose as mentioned above. So the
>> above comment about the copyFromArray(...) apply here as well.
>>
>> My conclusion, so far, is that splitting our task into 7 new
>> copyFromArray(...) methods and 7 new copyToArray(...) methods
>> that must operate in a chain, provides a clumsy solution.
>
> I think "clumsy" here might be a bit of a glass-half-empty
> comment. And some points as to why the glass is perceived to be
> half empty are probably well taken (e.g. translation of
> coordinates) but I think there is a general "I don't want to
> slice" feel to all this, which sounds like jumping into premature
> optimization. I'd like to make sure we resist that temptation, no
> matter what we end up doing in the end.
>
>
>
>>
>> *So what is our task?*
>>
>> What we want our method(s) to accomplish, *at the lowest level,*
>> is actually very simple. We want to copy /lengthBytes/ from one
>> blob of bytes to another blob of bytes, each with a specific
>> offset. As a diagram, it looks like this:
>> ArrayCopyForMemorySegment.png
>> Let's forget about arrays of XYZ for a moment. We could
>> accomplish this with a simple method like :
>>
>> void copy(source, srcOffsetBytes, destination,
>> dstOffsetBytes, lengthBytes);
>>
>>
>> The beauty of the unsafe methods is that under the covers,
>> everything is bytes. And if you /unsafe.putInt//(offset, int)/
>> into memory, unsafe views it as just 4 bytes and it forgets that
>> it ever was an integer.
>>
>> A MemorySegment is just a blob of bytes somewhere in memory. So
>> if we just had
>>
>> void copy(srcSegment, srcOffsetBytes, dstSegment,
>> dstOffsetBytes, lengthBytes);
>>
>>
>> *This would already be a powerful method on its own! (And we
>> don't have this!)*
>
> Ok, now the thing I was afraid of (premature optimization) is
> sneaking in full - it's not true we don't have support for the
> above - we do, but not in the way you described:
>
> sourceSegment.slice(....).copyFrom(destSegment.slice(...))
>
> Combining copyFrom and asSlice you can obtain whatever combination
> of copying you can obtain with low level Unsafe::copyMemory.
>
> We're not talking about missing primitives here (it's all in
> there). And, simplicity is often in the eye of the beholder: you
> seem to imply that a method taking 5 (!!) arguments is better than
> having a 1-arg method and moving the slicing closer to where it
> belongs. I think I object to that. In the slicing version there's
> no ambiguity as to what parameters refer to.
>
> The main reason as to why I would see someone prefer the 5
> argument version is, again, some notion that slicing costs too much.
>
>
> Anyway - copying segment to segment is a completely different
> problems from copying segments into an existing array. I believe
> the former is addressed by the API (and I don't see a significant
> margin of improvements there). In the latter case, I agree there
> is too much convoluted-ness going on, with creating a temp segment
> for holding heap memory and what's not.
>
>
>> In addition, we would like to enhance this method so that either
>> the source or the destination could be one of the 7 primitive
>> Java arrays.
>>
>> The task for the method that needs to do this is trivial: To
>> convert from array units to bytes is a simple left shift. To
>> convert from bytes to array units is a simple right-shift
>> instruction. Both of these are single-cycle CPU instructions.
>> The /copy/ should be a low-level bulk copy operation. This method
>> should be as fast as a comparable operation in C/C++. I don't
>> see any reason why this couldn't be implemented as a native
>> method. In fact, the lowest level of this operation could
>> probably be accomplished with a few machine instructions.
>> (However, a really powerful bulk array copy that takes advantage
>> of cache-line sizes and non-aligned offsets is more complex; but
>> this kind of copy method should already exist in the JVM.)
>>
>> For arrays, the simplest solution, IMHO, would be 7 copyToArray()
>> methods and 7 copyFromArray() methods structured as follows:
>>
>> void copyFromArray(XYZ[] srcArray, int srcStartIndex, int
>> srcElements,
>> MemorySegment dstSegment, long dstOffsetBytes, ByteOrder
>> order);
>>
>> and
>>
>> void copyToArray(MemorySegment srcSegment, long srcOffsetBytes,
>> XYZ[] dstArray, int dstStartIndex, int dstElements,
>> ByteOrder order)
>>
>> * The simple translation of array units to bytes and the
>> reverse is handled by the method.
>> Thus, the user does not need to bother doing this, which
>> helps eliminate human error.
>> * There is no duplication of the length specification, it is
>> only specified once.
>> * The method names are self explanatory including the relevant
>> units, where relevant
>> * As these methods are self contained and with no escapes. It
>> could easily be a static method.
>> * These methods solve the complete problem. No need to chain
>> multiple methods together.
>> * These methods could be part of MemorySegment, MemoryAccess,
>> or its own class.
>>
> If this path is preferred, and we also join in with Radoslaw
> comment, I think I'd like to add all the required overloads
> (including ByteOrder-less ones) into a new class: MemoryCopy.
>
> E.g. let's keep the MemorySegment API thin, by adding only one method:
>
> copyFrom(MemorySegment, MemoryLayout elemLayout, ByteOrder order)
>
> Which generalizes the existing copyFrom, allowing for swap copy.
>
> And then let's build on this, and provide several static methods
> on the side, like we do with MemoryAccess.
>
> Maurizio
>
>
>> Cheers, and thank you for your patience!
>>
>> Lee.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Apr 28, 2021 at 2:49 AM Maurizio Cimadamore
>> <maurizio.cimadamore at oracle.com
>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>> Hi Lee
>> IMHO some of the disadvantages you mention about 2 are a bit
>> accidental - and not dissimilar to what happens here:
>>
>> https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/Collection.html#toArray(T%5B%5D)
>> <https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/Collection.html#toArray(T%5B%5D)>
>>
>> But I can understand that this argument might not be
>> persuading you.
>>
>> For endianness, I think that there is a difference between:
>>
>> * I have a segment and I want to read an int
>> * I have a segment and I want to copy parts of it in an
>> _existing_ array
>>
>> The former is super common, the latter feels like something
>> you reach for in more advanced use cases, honestly. My
>> feeling is that having a "more friendly" overload for what is
>> a pretty advanced method anyway feels some sort of middle
>> ground that would double the size of the API w/o really
>> making it more usable (I don't thing users like you would be
>> bothered by an extra `nativeOrder()` suffix). At least that's
>> what I was assuming.
>>
>> As for (1) - some comments:
>>
>> * you seem to assume that "native implementation" == better
>> performance. I don't think that's the case. And I don't see
>> why you think it'd be more elegant that way. The main thing
>> that needs to be done, if we go down this path, is to keep
>> profiling information sharp - that is, if you call
>> copyToArray with a byte[] and then with a short[] you want C2
>> to work with different type profile information. For this
>> there is a technique in hotspot, called argument type
>> profiling, which is used for pseudo static methods in Unsafe
>> (and in MemoryAccess) which I think we can use here.
>>
>> * I don't see why it has to be a "static" method, or why a
>> static method is perceived to be better. The issue with
>> static methods is that they don't mix with slicing:
>>
>> MemorySegment.copyToArray(sourceSegment.asSlice(), ....)
>>
>> this is worse than:
>>
>> sourceSegment.asSlice().copyToArray(...)
>>
>> (btw, this avoids, at least in part, some of the type
>> profiling pot-holes highlighted above, as receiver type
>> information is always speculated upon, w/o doing anything
>> special). So, two instances methods copyToArray and
>> copyFromArray don't sound less elegant to me (actually more so)
>>
>> * Accepting an Object parameter is inconsistent with e.g. the
>> MemorySegment.of(...) factories, which take sharp array
>> types. In principle those factories could be compressed down
>> into one (in fact the API started off that way, but we went
>> back for performance issues).
>>
>>
>> in both approaches, I think if we use an instance method we
>> should just drop the length parameter from the method and
>> infer the length from the receiver segment. Here's a table I
>> prepared yesterday night thinking about all possible
>> combinations (this assumed approach (1)):
>>
>> * from array to segment
>> - segment does not exist: MemorySegment.of(array)
>> [.asSlice(...)]
>> - segment does exist: segment [.asSlice(...)]
>> .copyFrom(array, index, order)
>> * from segment to array
>> - array does not exist: segment [.asSlice(...)] .toXYZArray()
>> - arrays does exist: segment [.asSlice(...)]
>> .toXYZArray(array, index, order)
>>
>> (this still doesn't seem to be that bad - if toXYZArray name
>> bothers you, just replace that with copyToArray - the rest
>> stays the same).
>>
>> Maurizio
>>
>> On 28/04/2021 01:43, leerho wrote:
>>> Maurizio,
>>>
>>> This is great. You have opened up several new possibilities
>>> and I'd like to explore some of them.
>>>
>>> 1. The use of /Object/, a la /System.arrayCopy(Object src,
>>> int srcPos, Object dest, int destPos, int length)./
>>> There are several interesting ideas (and concerns) that spin
>>> off of this idea.
>>>
>>> 1. First, is that System.arrayCopy() is a native method so
>>> all the work is under the covers, which I cannot see (I
>>> don't know how to find the C++ code that implements this
>>> method.)
>>> * Advantage: This would greatly reduce the number of
>>> methods needed (as you pointed out).
>>> * Concern: Performance. I would think we would be
>>> moving work that could be done at compile time
>>> (e.g., what type of array is it?) to run time. Or
>>> would Hotspot figure this out ?
>>> * Advantage: Potentially this could allow boxed array
>>> types as the native code could just figure this out
>>> and do the right thing.
>>> * Concern: because of its similarity to
>>> /System.arrayCopy()/, the user might be tempted to
>>> use it for array types that are completely
>>> inappropriate. (Just throw an exception.)
>>> 2. This /Object/ idea also led me to: What if we extended
>>> the C++ code underneath arrayCopy() to allow one or both
>>> of the Objects to be a MemorySegment?
>>> * Make it clear that both Pos arguments and the length
>>> argument would have to be in bytes and these
>>> arguments would have to become longs. This fact
>>> alone will kill this idea as it raises all kinds of
>>> issues. Terrible idea, forget this one!
>>> 3. What about this idea. Two /static void arrayCopy-type/
>>> methods in MemorySegment: (or 4 if we overload with
>>> ByteOrder):
>>> * /copyToArray(MemorySegment srcSegment, long
>>> srcOffsetBytes, Object dstArray, int dstIndex, int
>>> dstElements)/;
>>> * /copyFromArray(Object srcArray, int srcIndex, int
>>> srcElements, MemorySegment dstSegment, long
>>> dstOffsetBytes)/;
>>> * If these are implemented as native methods AND if
>>> there is not a runtime performance burden, THIS
>>> would be elegant!
>>>
>>> 2. "split" the overloads between copyFrom and toXYZ arrays.
>>>
>>> * Is this what you had in mind? (7 or 14 of these pairs)
>>> o /int[] dstArray = srcSegment.asSlice(long
>>> offsetBytes, long lengthBytes).toIntArray(int[]
>>> dstArray, int dstIndex, int dstElements)/;
>>> Because toIntArray returns an array, this changes
>>> the behavior whereby the array is provided and
>>> returned. Clunky.
>>> This requires specifying the length twice, once in
>>> bytes, the other in elements. :-(
>>> o /dstSegment.asSlice(long offsetBytes, long
>>> lengthBytes).copyFrom(int[] srcArray, int srcIndex,
>>> int srcElements)/;
>>> This also requires specifying the length twice, once
>>> in bytes, the other in elements. :-(
>>> * Perhaps you meant this? (7 or 14 of these pairs)
>>> o /int[] dstArray = srcSegment.toIntArray(long
>>> srcOffsetBytes, int[] dstArray, int dstIndex, int
>>> dstElements)/;
>>> This still has a clunky return of the dstArray that
>>> isn't really needed. Perhaps this variant would be a
>>> void return?
>>> o /dstSegment.copyFrom(int[] srcArray, int srcIndex,
>>> int srcElements, long dstOffsetBytes)/;
>>>
>>> I think I like 1.3 the best.
>>>
>>> * Only 2 or 4 new methods (let's assume 4)
>>> * But it may require the most under-the-covers work, which
>>> would be in C++. I'm hoping some of the code from
>>> System.arrayCopy() could be leveraged.
>>>
>>> *Note*: I think there is a good reason why MemoryAccess
>>> overloads all the methods with ByteOffset set to the
>>> default: and that is because 90% of the use-cases and users
>>> want the default and don't really want to always be
>>> specifying ByteOrder, even if it is only a static method and
>>> quite fast; it would still be viewed as unnecessary! If we
>>> are choosing between only 2 or 4 added methods, overloading
>>> with the default ByteOrder should not be a big deal.
>>>
>>> Cheers,
>>>
>>> Lee.
>>>
>>> /
>>> /
>>>
>>>
>>> On Tue, Apr 27, 2021 at 2:13 PM Maurizio Cimadamore
>>> <maurizio.cimadamore at oracle.com
>>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>
>>>
>>> On 27/04/2021 21:50, Maurizio Cimadamore wrote:
>>> > is just one native call away, after all.
>>>
>>> Whoops - I meant "static" not, "native", of course :-)
>>>
>>> Maurizio
>>>
More information about the panama-dev
mailing list