[foreign-memaccess+abi] RFR: 8270376: Finalize API for memory copy

Tue Jul 13 18:23:31 UTC 2021

On Tue, 13 Jul 2021 16:49:06 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> This patch ties up some loose ends with the MemoryCopy API, and it also prepares the ground for some other related refactorings in this area.
> 
> The meat of this patch is represented by the various changes in MemoryCopy, where all methods were renamed to simply `copy`, and the length parameter (now called `elementCount`) is always moved to the end.
> 
> I've also simplified naming of parameters, as I think distinguishing between `index` and `offset` is enough (e.g. an array has an index, a segment has an offset).
> 
> You will see that, at the very end of the class, three more copy methods have been added, which deal fully in terms of segments. I have also moved the complex layout-based memory segment copy operation (which does the swap) as a static method in this class, as I believe the static form makes the method more regular and usable.
> 
> I've made some changes to our uses of `copyFrom` in the linker, to use the new methods in `MemoryCopy` when the copy operation was performing slicing in the source/target segment, which I think makes the code more readable. Of course these changes are completely optional and could be omitted as well.

You might ask if it still makes sense to keep `MemorySegment::copyFrom` (at least I have asked myself that :-) ). I think currently our API is not very consistent in terms of static vs. non-static boundaries. For instance, all the API to do memory access is defined in terms of static methods in the `MemoryAccess` class. For bulk copy, things are more convoluted, as there is now a static class `MemoryCopy`, but also instance methods in `MemorySegment`, such as `copyFrom`, but also `toXYZArray` (which is the dual case).

To add to the confusion, the SegmentAllocator interface also defines its own suite of allocator methods that can be used to set either a single value, or an array of values - this idiom is used to facilitate native interop code, so that one can do:

MemorySegment segment = allocator.allocateArray(C_FLOAT, new float[] { ... });

or:

MemorySegment segment = allocator.allocateArray(C_FLOAT, 4.0f);

And do memory allocation *and* initialization in a single shot. This can be problematic, as single dereference might not be very well optimized (as var handles are created on the fly), and also because it creates an imbalance in the API, between clients that have a `SegmentAllocator` handy and those which don't.

In principle, we could add all the required access and bulk copy methods as instance methods in `MemorySegment` and ditch the static classes - this would help the `SegmentAllocator` cause too.

In practice, however, I believe that it is not realistic to think that we can put *all* of the `MemoryAccess` and `MemoryCopy` API as instance methods inside `MemorySegment`. While this works for `ByteBuffer` (although there are still quite a bit of methods), it fails to scale here, as:

* every operation needs an endianness-ful overload (meaning method size goes 2x)
* here we need (at least) a pair of bulk ops per carrier, while `ByteBuffer` only provides those for `byte[]`.

For these reasons, I still think that offloading dereference and copy methods on side classes is a good move, so that the `MemorySegment` API can remain relatively simple.

So, what should we do with instance methods like `copyFrom` and `toArray`? I think these methods do have a role in the API - which is that of facilitating transfer of data from and to the Java heap. For instance:

MemorySegment segment = MemorySegment.allocateNative(16)
                         .copyFrom(new float [] { ... }); // from Java to native
...
float[] = segment.toFloatArray(); // from native to Java

The above feels like a common enough pattern when marshalling data structures in and out of the Java heap that, perhaps it still justifies the cost of inserting additional instance methods in the API. Another option would be to just provide more `allocateNative` factories:

MemorySegment segment = MemorySegment.allocateNative(16, new float [] { ... }); // from Java to native
...
float[] = segment.toFloatArray(); // from native to Java

While this is nice, if we go down this path we'd have to replicate everything inside `SegmentAllocator` (otherwise clients using `SegmentAllocator` would be worse off).

Overall, I believe the best course of action would be to:

1. introduce many `MemorySegment::copyFrom` instance, in a 1-1 with `MemorySegment::toXYZArray`
2. simplify the `SegmentAllocator` interface to just be about segment allocation (and leave initialization to other APIs)
3. for complex copy cases which do not fit in the simple `copyFrom` format, users will fallback to `MemoryCopy`

-------------

PR: https://git.openjdk.java.net/panama-foreign/pull/568