Part 1 Proposal for JDK-8264594

Thu Apr 29 04:49:07 UTC 2021

Maurizio,

First, I would like to clear up some misunderstandings.

1. Re overloading with/without ByteOrder:

My feeling is that having a "more friendly" overload for what is a pretty
> advanced method anyway feels some sort of middle ground that would double
> the size of the API w/o really making it more usable (I don't thing users
> like you would be bothered by an extra `nativeOrder()` suffix). At least
> that's what I was assuming.

I was trying to make the distinction between 2 vs 4 methods, assuming
Object array), and 14 vs 28 methods assuming individual XYZArrays.  The
second case is clearly a more egregious impact on the API so I would
definitely agree that not overloading the ByteOrder makes sense. * And yes,
I can live with that :)*

The 2 vs 4 methods refers to the next item.

2. Use of Object array:

Accepting an Object parameter is inconsistent with e.g. the
> MemorySegment.of(...) factories, which take sharp array types. In principle
> those factories could be compressed down into one (in fact the API started
> off that way, but we went back for performance issues).

That is why I posed the question wrt use of Object array:

Concern: Performance.  I would think we would be moving work that could be
> done at compile time (e.g., what type of array is it? ) to run time.   Or
> would Hotspot figure this out ?

I wasn't sure, which is why I asked. And you indicated that performance
with an Object array is an issue *and I believe you* :)

3. Native implementations:

You seem to assume that "native implementation" == better performance.

No, I don't really.  My only comment about native methods is that I don't
know where to find the C++ code so I can't figure out what is going on
under the covers.  Additionally, as you pointed out, the C2 hotspot
compiler has additional profiling it can take advantage of ... and I even
know less about that!   *So I'll leave it to you what methods should be
made native*.

*Comment*: In our Apache DataSketches library we have implemented all of
our sketching algorithms in both C++ and in Java.  And we have experienced
a number of cases where our highly-tuned java implementation handily beats
a casually constructed C++ implementation!   We have found that there are
similar issues in both languages.  If the developer tends to over-rely on
the general-purpose STL librarys in C++ it will impact performance in the
same way that over-reliance on the Java Collections library will.  But a
well-tuned C++ implementation will generally beat a well-tuned Java
implementation, but not by very much!  And with this Panama module, I am
hoping we can make the difference even less.

I think we can agree that the above 3 issues can be put aside and don't
need further discussion..

****
Back to the method design:

I think if we use an instance method we should just drop the length
> parameter from the method and infer the length from the receiver segment.

This does present some usability concerns as I will highlight next.

Thoughts on your 2 X 2 table:
I think we don't need to focus on the two cases where either the segment or
the array does not exist.  These can already be handled with the current
API.  Now for the other two cases:

1. From array to existing segment, you wrote:

> segment [.asSlice(...)] .copyFrom(array, index, order)

My translation:

> dstSegment.asSlice(long dstOffsetBytes, long
> dstLengthBytes).copyFromArray(XYZ[] srcArray, int srcIndex, ByteOrder
> order);

   - The user generally has better knowledge of the array coordinates
   (start, length). Here the user would have to translate from array
   srcElements to dstLengthBytes.  I don't consider this friendly.
   - I don't think you intended to overload the existing copyFrom(segment),
   thus, copyFromArray(...).
   - The new methods *copyFromArray(XYZ[] srcArray, srcIndex, order) are *not
   as general purpose as they would it be if it included the array length
   argument, but leads to the problem of dual specification of the length both
   in bytes and array elements.

2. From segment to existing array, you wrote:

> segment [.asSlice(...)] .toXYZArray(array, index, order)

My translation:

> srcSegment.asSlice(long dstOffsetBytes, long
> dstLengthBytes).copyToArray(XYZ[] dstArray, int dstIndex, ByteOrder order)

   - Again, the user would have to translate from his dstElements to
   dstLengthBytes.
   - Again, the new methods *copyToArray(XYZ[] dstArray, dstIndex, order)*
   are not as general purpose as mentioned above. So the above comment about
   the copyFromArray(...) apply here as well.

My conclusion, so far, is that splitting our task into 7 new
copyFromArray(...) methods and 7 new copyToArray(...) methods that
must operate in a chain, provides a clumsy solution.

*So what is our task?*

What we want our method(s) to accomplish, *at the lowest level,* is
actually very simple.  We want to copy *lengthBytes* from one blob of bytes
to another blob of bytes, each with a specific offset.  As a diagram, it
looks like this:
[image: ArrayCopyForMemorySegment.png]
Let's forget about arrays of XYZ for a moment.  We could accomplish this
with a simple method like :

void copy(source, srcOffsetBytes, destination, dstOffsetBytes, lengthBytes);

The beauty of the unsafe methods is that under the covers, everything is
bytes.  And if you *unsafe.putInt**(offset, int)* into memory, unsafe views
it as just 4 bytes and it forgets that it ever was an integer.

A MemorySegment is just a blob of bytes somewhere in memory.  So if we just
had

void copy(srcSegment, srcOffsetBytes, dstSegment, dstOffsetBytes,
> lengthBytes);

*This would already be a powerful method on its own!  (And we don't have
this!)*

In addition, we would like to enhance this method so that either the source
or the destination could be one of the 7 primitive Java arrays.

The task for the method that needs to do this is trivial: To convert from
array units to bytes is a simple left shift. To convert from bytes to array
units is a simple right-shift instruction.  Both of these are single-cycle
CPU instructions. The *copy* should be a low-level bulk copy operation.
This method should be as fast as a comparable operation in C/C++.  I don't
see any reason why this couldn't be implemented as a native method.  In
fact, the lowest level of this operation could probably be accomplished
with a few machine instructions.  (However, a really powerful bulk array
copy that takes advantage of cache-line sizes and non-aligned offsets is
more complex; but this kind of copy method should already exist in the JVM.)

For arrays, the simplest solution, IMHO, would be 7 copyToArray() methods
and 7 copyFromArray() methods structured as follows:

void copyFromArray(XYZ[] srcArray, int srcStartIndex, int srcElements,
>     MemorySegment dstSegment, long dstOffsetBytes, ByteOrder order);
>
and

> void copyToArray(MemorySegment srcSegment, long srcOffsetBytes,
>     XYZ[] dstArray, int dstStartIndex, int dstElements, ByteOrder order)

   - The simple translation of array units to bytes and the reverse is
   handled by the method.
   Thus, the user does not need to bother doing this, which helps eliminate
   human error.
   - There is no duplication of the length specification, it is only
   specified once.
   - The method names are self explanatory including the relevant units,
   where relevant
   - As these methods are self contained and with no escapes. It could
   easily be a static method.
   - These methods solve the complete problem.  No need to chain multiple
   methods together.
   - These methods could be part of MemorySegment, MemoryAccess, or its own
   class.

Cheers, and thank you for your patience!

Lee.

On Wed, Apr 28, 2021 at 2:49 AM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

> Hi Lee
> IMHO some of the disadvantages you mention about 2 are a bit accidental -
> and not dissimilar to what happens here:
>
>
> https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/Collection.html#toArray(T%5B%5D)
>
> But I can understand that this argument might not be persuading you.
>
> For endianness, I think that there is a difference between:
>
> * I have a segment and I want to read an int
> * I have a segment and I want to copy parts of it in an _existing_ array
>
> The former is super common, the latter feels like something you reach for
> in more advanced use cases, honestly. My feeling is that having a "more
> friendly" overload for what is a pretty advanced method anyway feels some
> sort of middle ground that would double the size of the API w/o really
> making it more usable (I don't thing users like you would be bothered by an
> extra `nativeOrder()` suffix). At least that's what I was assuming.
>
> As for (1) - some comments:
>
> * you seem to assume that "native implementation" == better performance. I
> don't think that's the case. And I don't see why you think it'd be more
> elegant that way. The main thing that needs to be done, if we go down this
> path, is to keep profiling information sharp - that is, if you call
> copyToArray with a byte[] and then with a short[] you want C2 to work with
> different type profile information. For this there is a technique in
> hotspot, called argument type profiling, which is used for pseudo static
> methods in Unsafe (and in MemoryAccess) which I think we can use here.
>
> * I don't see why it has to be a "static" method, or why a static method
> is perceived to be better. The issue with static methods is that they don't
> mix with slicing:
>
> MemorySegment.copyToArray(sourceSegment.asSlice(), ....)
>
> this is worse than:
>
> sourceSegment.asSlice().copyToArray(...)
>
> (btw, this avoids, at least in part, some of the type profiling pot-holes
> highlighted above, as receiver type information is always speculated upon,
> w/o doing anything special). So, two instances methods copyToArray and
> copyFromArray don't sound less elegant to me (actually more so)
>
> * Accepting an Object parameter is inconsistent with e.g. the
> MemorySegment.of(...) factories, which take sharp array types. In principle
> those factories could be compressed down into one (in fact the API started
> off that way, but we went back for performance issues).
>
>
> in both approaches, I think if we use an instance method we should just
> drop the length parameter from the method and infer the length from the
> receiver segment. Here's a table I prepared yesterday night thinking about
> all possible combinations (this assumed approach (1)):
>
> * from array to segment
>     - segment does not exist: MemorySegment.of(array) [.asSlice(...)]
>     - segment does exist: segment [.asSlice(...)] .copyFrom(array, index,
> order)
> * from segment to array
>     - array does not exist: segment [.asSlice(...)] .toXYZArray()
>     - arrays does exist: segment [.asSlice(...)] .toXYZArray(array, index,
> order)
>
> (this still doesn't seem to be that bad - if toXYZArray name bothers you,
> just replace that with copyToArray - the rest stays the same).
>
> Maurizio
> On 28/04/2021 01:43, leerho wrote:
>
> Maurizio,
>
> This is great.  You have opened up several new possibilities and I'd like
> to explore some of them.
>
> 1. The use of *Object*, a la *System.arrayCopy(Object src, int srcPos,
> Object dest, int destPos, int length).*
> There are several interesting ideas (and concerns) that spin off of this
> idea.
>
>    1. First, is that System.arrayCopy() is a native method so all the
>    work is under the covers, which I cannot see (I don't know how to find the
>    C++ code that implements this method.)
>       - Advantage: This would greatly reduce the number of methods needed
>       (as you pointed out).
>       - Concern: Performance.  I would think we would be moving work that
>       could be done at compile time (e.g., what type of array is it?) to run
>       time.   Or would Hotspot figure this out ?
>       - Advantage: Potentially this could allow boxed array types as the
>       native code could just figure this out and do the right thing.
>       - Concern: because of its similarity to *System.arrayCopy()*, the
>       user might be tempted to use it for array types that are completely
>       inappropriate.  (Just throw an exception.)
>    2. This *Object* idea also led me to: What if we extended the C++ code
>    underneath arrayCopy() to allow one or both of the Objects to be a
>    MemorySegment?
>       - Make it clear that both Pos arguments and the length argument
>       would have to be in bytes and these arguments would have to become longs.
>       This fact alone will kill this idea as it raises all kinds of issues.
>       Terrible idea, forget this one!
>    3. What about this idea. Two *static void arrayCopy-type* methods in
>    MemorySegment: (or 4 if we overload with ByteOrder):
>       - *copyToArray(MemorySegment srcSegment, long srcOffsetBytes,
>       Object dstArray, int dstIndex, int dstElements)*;
>       - *copyFromArray(Object srcArray, int srcIndex, int srcElements,
>       MemorySegment dstSegment, long dstOffsetBytes)*;
>       - If these are implemented as native methods AND if there is not a
>       runtime performance burden, THIS would be elegant!
>
> 2. "split" the overloads between copyFrom and toXYZ arrays.
>
>    - Is this what you had in mind? (7 or 14 of these pairs)
>       - *int[] dstArray = srcSegment.asSlice(long offsetBytes, long
>       lengthBytes).toIntArray(int[] dstArray, int dstIndex, int dstElements)*
>       ;
>       Because toIntArray returns an array, this changes the behavior
>       whereby the array is provided and returned. Clunky.
>       This requires specifying the length twice, once in bytes, the other
>       in elements. :-(
>       - *dstSegment.asSlice(long offsetBytes, long
>       lengthBytes).copyFrom(int[] srcArray, int srcIndex, int srcElements)*
>       ;
>       This also requires specifying the length twice, once in bytes, the
>       other in elements.  :-(
>    - Perhaps you meant this? (7 or 14 of these pairs)
>       - *int[] dstArray = srcSegment.toIntArray(long
>       srcOffsetBytes, int[] dstArray, int dstIndex, int dstElements)*;
>       This still has a clunky return of the dstArray that isn't really
>       needed. Perhaps this variant would be a void return?
>       - *dstSegment.copyFrom(int[] srcArray, int srcIndex, int
>       srcElements, long dstOffsetBytes)*;
>
> I think I like 1.3 the best.
>
>    - Only 2 or 4 new methods (let's assume 4)
>    - But it may require the most under-the-covers work, which would be in
>    C++.  I'm hoping some of the code from System.arrayCopy() could be
>    leveraged.
>
> *Note*: I think there is a good reason why MemoryAccess overloads all the
> methods with ByteOffset set to the default: and that is because 90% of the
> use-cases and users want the default and don't really want to always be
> specifying ByteOrder, even if it is only a static method and quite fast; it
> would still be viewed as unnecessary!  If we are choosing between only 2 or
> 4 added methods, overloading with the default ByteOrder should not be a big
> deal.
>
> Cheers,
>
> Lee.
>
>
>
>
> On Tue, Apr 27, 2021 at 2:13 PM Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com> wrote:
>
>>
>> On 27/04/2021 21:50, Maurizio Cimadamore wrote:
>> > is just one native call away, after all.
>>
>> Whoops - I meant "static" not, "native", of course :-)
>>
>> Maurizio
>>
>>