Part 1 Proposal for JDK-8264594

Thu May 20 01:34:47 UTC 2021

Maurizio,
Sorry about my radio silence, I was pulled onto another urgent project, but
now I'm back :)
>From your email above:

E.g. let's keep the MemorySegment API thin, by adding only one method:
>

*copyFrom(MemorySegment, MemoryLayout elemLayout, ByteOrder order)*

Which generalizes the existing copyFrom, allowing for swap copy.

I am a little unclear how the user would use this to copy some primitive
array to/from a MemorySegment.  Let me try:

For example: to copy a double array into a MemorySegment would require:

//Given:
int arrLen = N;
double[] srcArr = new double[arrLen];
ResourceScope scope = ...
MemorySegment dstSeg = ... big enough, but we need to load at a dstOffset.

//prepare the source segment:
MemoryLayout doubleLayout = valueLayout(64, ByteOrder.nativeOrder);
MemoryLayout seqLayout = sequenceLayout(arrLen, doubleLayout);
MemorySegment srcSeg = MemorySegment.ofArray(srcArr);

//prepare the destination segment:
dstSegSlice = dstSeg.asSlice(dstOffset, arrLen*8);
dstSeg.copyFrom(srcSeg, seqLayout, ByteOrder.nativeOrder);

Whew!   This can't be right. I must be doing something wrong.  How did you
intend for this new method to be used?

Cheers,

Lee.

On Thu, Apr 29, 2021 at 3:23 AM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

>
> On 29/04/2021 05:49, leerho wrote:
>
> Maurizio,
>
> First, I would like to clear up some misunderstandings.
>
> 1. Re overloading with/without ByteOrder:
>
> My feeling is that having a "more friendly" overload for what is a pretty
>> advanced method anyway feels some sort of middle ground that would double
>> the size of the API w/o really making it more usable (I don't thing users
>> like you would be bothered by an extra `nativeOrder()` suffix). At least
>> that's what I was assuming.
>
>
> I was trying to make the distinction between 2 vs 4 methods, assuming
> Object array), and 14 vs 28 methods assuming individual XYZArrays.  The
> second case is clearly a more egregious impact on the API so I would
> definitely agree that not overloading the ByteOrder makes sense. * And
> yes, I can live with that :)*
>
> The 2 vs 4 methods refers to the next item.
>
> Yeah - I realized later that you were considering different cost vs.
> benefit ratios in the two proposals - in one case it seems to cost more
> than the other, I agree. (I also agree, in general, that if we could
> provide a ByteOrder-less version, it would be nice to have).
>
>
> 2. Use of Object array:
>
> Accepting an Object parameter is inconsistent with e.g. the
>> MemorySegment.of(...) factories, which take sharp array types. In principle
>> those factories could be compressed down into one (in fact the API started
>> off that way, but we went back for performance issues).
>
>
> That is why I posed the question wrt use of Object array:
>
> Concern: Performance.  I would think we would be moving work that could be
>> done at compile time (e.g., what type of array is it? ) to run time.   Or
>> would Hotspot figure this out ?
>
>
> I wasn't sure, which is why I asked. And you indicated that performance
> with an Object array is an issue *and I believe you* :)
>
> There is a difference between MemorySegment.of(Object) and
> MemorySegment.copyInto(Object). In the first operation we are "creating" a
> new segment; I have observed that, in general, it is better to have
> creation of new instances as a transparent process as possible: if
> different methods create different concrete Java instances (for instance
> MemorySegment::ofArray(int[]) creates a different object than
> MemorySegment::ofArray(double[])) the code tends to be more "transparent"
> to the optimizing compiler, and typically that works better than having a
> single factory which does a bunch of stuff/control flow and might return
> different objects (which will necessarily create some pollution). I believe
> the case of memory copy is not dissimilar from what MemoryAccess.getInt
> does: note that you just pass a `memory segment` - we don't have overloads
> for "heap segment", "mapped segment" and "native segment" and yet C2 is
> able to figure that out (because we turned enhanced profiling on for these
> methods). A similar trick, I'm sure could be done here, if we were willing
> to. Of course we need benchmarks to test and validate the approach.
>
>
> 3. Native implementations:
>
> You seem to assume that "native implementation" == better performance.
>
>
> No, I don't really.  My only comment about native methods is that I don't
> know where to find the C++ code so I can't figure out what is going on
> under the covers.  Additionally, as you pointed out, the C2 hotspot
> compiler has additional profiling it can take advantage of ... and I even
> know less about that!   *So I'll leave it to you what methods should be
> made native*.
>
> So, there are two routines for copying memory: System::arrayCopy and
> Unsafe::copyMemory. Performance-wise they behave the same - I believe one
> minor difference is that System.arrayCopy inserts some GC safe point so
> that, in the middle of a big copy, it can still pause for GC -
> Unsafe::copyMemory (the one we use) doesn't do that. They are both native
> routines, and are both "intrinsified" by the VM - which means that when
> they are called enough time, the VM replaces them with bits of machine code
> generated by the optimizing compiler. This is where information about types
> becomes crucial: if, when the intrinsification occurs, C2 can prove that
> e.g. you are copying from off-heap into a int[], then a sharper copy can be
> performed. If the types are opaque (so C2 doesn't know whether they are
> on/off heap), then performance suffers. So, as long as the types are kept
> "sharp", this is typically not an issue.
>
> There's also another important consideration: copyMemory (or
> System::arrayCopy) is, relatively speaking, a "slow" operation (since it
> can touch many addresses). In my experience it is way less likely to
> observe performance issues with this routines than it is with a single-shot
> memory access (which boils down to a single MOV, so every extra overhead
> counts big time!).
>
>
>
> *Comment*: In our Apache DataSketches library we have implemented all of
> our sketching algorithms in both C++ and in Java.  And we have experienced
> a number of cases where our highly-tuned java implementation handily beats
> a casually constructed C++ implementation!   We have found that there are
> similar issues in both languages.  If the developer tends to over-rely on
> the general-purpose STL librarys in C++ it will impact performance in the
> same way that over-reliance on the Java Collections library will.  But a
> well-tuned C++ implementation will generally beat a well-tuned Java
> implementation, but not by very much!  And with this Panama module, I am
> hoping we can make the difference even less.
>
> I think we can agree that the above 3 issues can be put aside and don't
> need further discussion..
>
> ok
>
>
> ****
> Back to the method design:
>
> I think if we use an instance method we should just drop the length
>> parameter from the method and infer the length from the receiver segment.
>
>
> This does present some usability concerns as I will highlight next.
>
> Thoughts on your 2 X 2 table:
> I think we don't need to focus on the two cases where either the segment
> or the array does not exist.
>
> Sure - I was showing them to demonstrate the emerging symmetry/pattern
>
>
> These can already be handled with the current API.  Now for the other two
> cases:
>
> 1. From array to existing segment, you wrote:
>
>> segment [.asSlice(...)] .copyFrom(array, index, order)
>
>
> My translation:
>
>> dstSegment.asSlice(long dstOffsetBytes, long
>> dstLengthBytes).copyFromArray(XYZ[] srcArray, int srcIndex, ByteOrder
>> order);
>
>
>    - The user generally has better knowledge of the array coordinates
>    (start, length). Here the user would have to translate from array
>    srcElements to dstLengthBytes.  I don't consider this friendly.
>    - I don't think you intended to overload the existing
>    copyFrom(segment), thus, copyFromArray(...).
>    - The new methods *copyFromArray(XYZ[] srcArray, srcIndex, order) are *not
>    as general purpose as they would it be if it included the array length
>    argument, but leads to the problem of dual specification of the length both
>    in bytes and array elements.
>
> I think of these three I can see the point of the first bullet - but it
> feels like having offset/length on the copyFrom feels slightly redundant
> (since asSlice does kind of the same thing, and that's what we tell people
> to do for segment->segment copyFrom). Anyway, point taken.
>
> 2. From segment to existing array, you wrote:
>
>> segment [.asSlice(...)] .toXYZArray(array, index, order)
>
>
> My translation:
>
>> srcSegment.asSlice(long dstOffsetBytes, long
>> dstLengthBytes).copyToArray(XYZ[] dstArray, int dstIndex, ByteOrder order)
>
>
>    - Again, the user would have to translate from his dstElements to
>    dstLengthBytes.
>    - Again, the new methods *copyToArray(XYZ[] dstArray, dstIndex, order)*
>    are not as general purpose as mentioned above. So the above comment about
>    the copyFromArray(...) apply here as well.
>
> My conclusion, so far, is that splitting our task into 7 new
> copyFromArray(...) methods and 7 new copyToArray(...) methods that
> must operate in a chain, provides a clumsy solution.
>
> I think "clumsy" here might be a bit of a glass-half-empty comment. And
> some points as to why the glass is perceived to be half empty are probably
> well taken (e.g. translation of coordinates) but I think there is a general
> "I don't want to slice" feel to all this, which sounds like jumping into
> premature optimization. I'd like to make sure we resist that temptation, no
> matter what we end up doing in the end.
>
>
>
>
> *So what is our task?*
>
> What we want our method(s) to accomplish, *at the lowest level,* is
> actually very simple.  We want to copy *lengthBytes* from one blob of
> bytes to another blob of bytes, each with a specific offset.  As a diagram,
> it looks like this:
> [image: ArrayCopyForMemorySegment.png]
> Let's forget about arrays of XYZ for a moment.  We could accomplish this
> with a simple method like :
>
> void copy(source, srcOffsetBytes, destination, dstOffsetBytes,
>> lengthBytes);
>
>
> The beauty of the unsafe methods is that under the covers, everything is
> bytes.  And if you *unsafe.putInt**(offset, int)* into memory, unsafe
> views it as just 4 bytes and it forgets that it ever was an integer.
>
> A MemorySegment is just a blob of bytes somewhere in memory.  So if we
> just had
>
> void copy(srcSegment, srcOffsetBytes, dstSegment, dstOffsetBytes,
>> lengthBytes);
>
>
> *This would already be a powerful method on its own!  (And we don't have
> this!)*
>
>
> Ok, now the thing I was afraid of (premature optimization) is sneaking in
> full - it's not true we don't have support for the above - we do, but not
> in the way you described:
>
> sourceSegment.slice(....).copyFrom(destSegment.slice(...))
>
> Combining copyFrom and asSlice you can obtain whatever combination of
> copying you can obtain with low level Unsafe::copyMemory.
>
> We're not talking about missing primitives here (it's all in there). And,
> simplicity is often in the eye of the beholder: you seem to imply that a
> method taking 5 (!!) arguments is better than having a 1-arg method and
> moving the slicing closer to where it belongs. I think I object to that. In
> the slicing version there's no ambiguity as to what parameters refer to.
>
> The main reason as to why I would see someone prefer the 5 argument
> version is, again, some notion that slicing costs too much.
>
>
> Anyway - copying segment to segment is a completely different problems
> from copying segments into an existing array. I believe the former is
> addressed by the API (and I don't see a significant margin of improvements
> there). In the latter case, I agree there is too much convoluted-ness going
> on, with creating a temp segment for holding heap memory and what's not.
>
>
> In addition, we would like to enhance this method so that either the
> source or the destination could be one of the 7 primitive Java arrays.
>
> The task for the method that needs to do this is trivial: To convert from
> array units to bytes is a simple left shift. To convert from bytes to array
> units is a simple right-shift instruction.  Both of these are single-cycle
> CPU instructions. The *copy* should be a low-level bulk copy operation.
> This method should be as fast as a comparable operation in C/C++.  I don't
> see any reason why this couldn't be implemented as a native method.  In
> fact, the lowest level of this operation could probably be accomplished
> with a few machine instructions.  (However, a really powerful bulk array
> copy that takes advantage of cache-line sizes and non-aligned offsets is
> more complex; but this kind of copy method should already exist in the JVM.)
>
> For arrays, the simplest solution, IMHO, would be 7 copyToArray() methods
> and 7 copyFromArray() methods structured as follows:
>
> void copyFromArray(XYZ[] srcArray, int srcStartIndex, int srcElements,
>>     MemorySegment dstSegment, long dstOffsetBytes, ByteOrder order);
>>
> and
>
>> void copyToArray(MemorySegment srcSegment, long srcOffsetBytes,
>>     XYZ[] dstArray, int dstStartIndex, int dstElements, ByteOrder order)
>
>
>    - The simple translation of array units to bytes and the reverse is
>    handled by the method.
>    Thus, the user does not need to bother doing this, which helps
>    eliminate human error.
>    - There is no duplication of the length specification, it is only
>    specified once.
>    - The method names are self explanatory including the relevant units,
>    where relevant
>    - As these methods are self contained and with no escapes. It could
>    easily be a static method.
>    - These methods solve the complete problem.  No need to chain multiple
>    methods together.
>    - These methods could be part of MemorySegment, MemoryAccess, or its
>    own class.
>
> If this path is preferred, and we also join in with Radoslaw comment, I
> think I'd like to add all the required overloads (including ByteOrder-less
> ones) into a new class: MemoryCopy.
>
> E.g. let's keep the MemorySegment API thin, by adding only one method:
>
> copyFrom(MemorySegment, MemoryLayout elemLayout, ByteOrder order)
>
> Which generalizes the existing copyFrom, allowing for swap copy.
>
> And then let's build on this, and provide several static methods on the
> side, like we do with MemoryAccess.
>
> Maurizio
>
>
> Cheers, and thank you for your patience!
>
> Lee.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Apr 28, 2021 at 2:49 AM Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com> wrote:
>
>> Hi Lee
>> IMHO some of the disadvantages you mention about 2 are a bit accidental -
>> and not dissimilar to what happens here:
>>
>>
>> https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/Collection.html#toArray(T%5B%5D)
>>
>> But I can understand that this argument might not be persuading you.
>>
>> For endianness, I think that there is a difference between:
>>
>> * I have a segment and I want to read an int
>> * I have a segment and I want to copy parts of it in an _existing_ array
>>
>> The former is super common, the latter feels like something you reach for
>> in more advanced use cases, honestly. My feeling is that having a "more
>> friendly" overload for what is a pretty advanced method anyway feels some
>> sort of middle ground that would double the size of the API w/o really
>> making it more usable (I don't thing users like you would be bothered by an
>> extra `nativeOrder()` suffix). At least that's what I was assuming.
>>
>> As for (1) - some comments:
>>
>> * you seem to assume that "native implementation" == better performance.
>> I don't think that's the case. And I don't see why you think it'd be more
>> elegant that way. The main thing that needs to be done, if we go down this
>> path, is to keep profiling information sharp - that is, if you call
>> copyToArray with a byte[] and then with a short[] you want C2 to work with
>> different type profile information. For this there is a technique in
>> hotspot, called argument type profiling, which is used for pseudo static
>> methods in Unsafe (and in MemoryAccess) which I think we can use here.
>>
>> * I don't see why it has to be a "static" method, or why a static method
>> is perceived to be better. The issue with static methods is that they don't
>> mix with slicing:
>>
>> MemorySegment.copyToArray(sourceSegment.asSlice(), ....)
>>
>> this is worse than:
>>
>> sourceSegment.asSlice().copyToArray(...)
>>
>> (btw, this avoids, at least in part, some of the type profiling pot-holes
>> highlighted above, as receiver type information is always speculated upon,
>> w/o doing anything special). So, two instances methods copyToArray and
>> copyFromArray don't sound less elegant to me (actually more so)
>>
>> * Accepting an Object parameter is inconsistent with e.g. the
>> MemorySegment.of(...) factories, which take sharp array types. In principle
>> those factories could be compressed down into one (in fact the API started
>> off that way, but we went back for performance issues).
>>
>>
>> in both approaches, I think if we use an instance method we should just
>> drop the length parameter from the method and infer the length from the
>> receiver segment. Here's a table I prepared yesterday night thinking about
>> all possible combinations (this assumed approach (1)):
>>
>> * from array to segment
>>     - segment does not exist: MemorySegment.of(array) [.asSlice(...)]
>>     - segment does exist: segment [.asSlice(...)] .copyFrom(array, index,
>> order)
>> * from segment to array
>>     - array does not exist: segment [.asSlice(...)] .toXYZArray()
>>     - arrays does exist: segment [.asSlice(...)] .toXYZArray(array,
>> index, order)
>>
>> (this still doesn't seem to be that bad - if toXYZArray name bothers you,
>> just replace that with copyToArray - the rest stays the same).
>>
>> Maurizio
>> On 28/04/2021 01:43, leerho wrote:
>>
>> Maurizio,
>>
>> This is great.  You have opened up several new possibilities and I'd like
>> to explore some of them.
>>
>> 1. The use of *Object*, a la *System.arrayCopy(Object src, int srcPos,
>> Object dest, int destPos, int length).*
>> There are several interesting ideas (and concerns) that spin off of this
>> idea.
>>
>>    1. First, is that System.arrayCopy() is a native method so all the
>>    work is under the covers, which I cannot see (I don't know how to find the
>>    C++ code that implements this method.)
>>       - Advantage: This would greatly reduce the number of methods
>>       needed (as you pointed out).
>>       - Concern: Performance.  I would think we would be moving work
>>       that could be done at compile time (e.g., what type of array is it?) to run
>>       time.   Or would Hotspot figure this out ?
>>       - Advantage: Potentially this could allow boxed array types as the
>>       native code could just figure this out and do the right thing.
>>       - Concern: because of its similarity to *System.arrayCopy()*, the
>>       user might be tempted to use it for array types that are completely
>>       inappropriate.  (Just throw an exception.)
>>    2. This *Object* idea also led me to: What if we extended the C++
>>    code underneath arrayCopy() to allow one or both of the Objects to be a
>>    MemorySegment?
>>       - Make it clear that both Pos arguments and the length argument
>>       would have to be in bytes and these arguments would have to become longs.
>>       This fact alone will kill this idea as it raises all kinds of issues.
>>       Terrible idea, forget this one!
>>    3. What about this idea. Two *static void arrayCopy-type* methods in
>>    MemorySegment: (or 4 if we overload with ByteOrder):
>>       - *copyToArray(MemorySegment srcSegment, long srcOffsetBytes,
>>       Object dstArray, int dstIndex, int dstElements)*;
>>       - *copyFromArray(Object srcArray, int srcIndex, int srcElements,
>>       MemorySegment dstSegment, long dstOffsetBytes)*;
>>       - If these are implemented as native methods AND if there is not a
>>       runtime performance burden, THIS would be elegant!
>>
>> 2. "split" the overloads between copyFrom and toXYZ arrays.
>>
>>    - Is this what you had in mind? (7 or 14 of these pairs)
>>       - *int[] dstArray = srcSegment.asSlice(long offsetBytes, long
>>       lengthBytes).toIntArray(int[] dstArray, int dstIndex, int dstElements)*
>>       ;
>>       Because toIntArray returns an array, this changes the behavior
>>       whereby the array is provided and returned. Clunky.
>>       This requires specifying the length twice, once in bytes, the
>>       other in elements. :-(
>>       - *dstSegment.asSlice(long offsetBytes, long
>>       lengthBytes).copyFrom(int[] srcArray, int srcIndex, int srcElements)*
>>       ;
>>       This also requires specifying the length twice, once in bytes, the
>>       other in elements.  :-(
>>    - Perhaps you meant this? (7 or 14 of these pairs)
>>       - *int[] dstArray = srcSegment.toIntArray(long
>>       srcOffsetBytes, int[] dstArray, int dstIndex, int dstElements)*;
>>       This still has a clunky return of the dstArray that isn't really
>>       needed. Perhaps this variant would be a void return?
>>       - *dstSegment.copyFrom(int[] srcArray, int srcIndex, int
>>       srcElements, long dstOffsetBytes)*;
>>
>> I think I like 1.3 the best.
>>
>>    - Only 2 or 4 new methods (let's assume 4)
>>    - But it may require the most under-the-covers work, which would be
>>    in C++.  I'm hoping some of the code from System.arrayCopy() could be
>>    leveraged.
>>
>> *Note*: I think there is a good reason why MemoryAccess overloads all
>> the methods with ByteOffset set to the default: and that is because 90% of
>> the use-cases and users want the default and don't really want to always be
>> specifying ByteOrder, even if it is only a static method and quite fast; it
>> would still be viewed as unnecessary!  If we are choosing between only 2 or
>> 4 added methods, overloading with the default ByteOrder should not be a big
>> deal.
>>
>> Cheers,
>>
>> Lee.
>>
>>
>>
>>
>> On Tue, Apr 27, 2021 at 2:13 PM Maurizio Cimadamore <
>> maurizio.cimadamore at oracle.com> wrote:
>>
>>>
>>> On 27/04/2021 21:50, Maurizio Cimadamore wrote:
>>> > is just one native call away, after all.
>>>
>>> Whoops - I meant "static" not, "native", of course :-)
>>>
>>> Maurizio
>>>
>>>