[foreign-memaccess+abi] RFR: 8315769: Add support for sliced allocation

Maurizio Cimadamore mcimadamore at openjdk.org
Wed Sep 6 11:14:36 UTC 2023


On Wed, 6 Sep 2023 10:41:18 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> This PR adds a new method in `SegmentAllocator`:
> 
> 
> default MemorySegment allocateFrom(ValueLayout elementLayout, MemorySegment source,
>                                        ValueLayout sourceElementLayout, long sourceOffset, long srcElementCount) {
> 
> 
> This method allows clients to allocate a new memory segment and copy the contents of a portion of an existing segment into the newly allocated region of memory. As such it can be used to address the following use cases:
> 
> * allocate from a `ByteBuffer`
> * allocate from another memory segment
> * allocate from a Java array slice
> 
> All these cases were not covered by the existing API points, which meant that developers had to use a more general allocation request (such as `allocate(long, long)`) and then pay a performance cost (because of memory zeroing). In other words, the new method in this PR completes the allocation API, by providing a flexible way to allocate a new segment from an existing source (another segment) with given offset and length.
> 
> Given that the new method is more general than the existing array-accepting `allocateFrom`, this PR rewires the existing array-accepting method to be rewritten on top of the new overload (and tweaks the javadoc of such methods accordingly).
> 
> One detail to note is that the new method takes _two_ element layouts - one is the layout of the newly allocated segment, whereas the other is the layout of the source segment. Such layouts must have same alignment and same carrier - but they can have different endianness (in which case a bulk copy with swap is performed). This is not too different from the most general `MemorySegment::copy` static overload.

Here's the benchmark results I got:


Benchmark                                (size)  Mode  Cnt   Score   Error  Units
AllocFromSliceTest.alloc_confined             5  avgt   30  69.853 ? 1.834  ns/op
AllocFromSliceTest.alloc_confined            20  avgt   30  65.872 ? 3.113  ns/op
AllocFromSliceTest.alloc_confined           100  avgt   30  61.102 ? 2.832  ns/op
AllocFromSliceTest.alloc_confined           500  avgt   30  68.474 ? 1.573  ns/op
AllocFromSliceTest.alloc_confined          1000  avgt   30  82.487 ? 1.597  ns/op
AllocFromSliceTest.alloc_confined_slice       5  avgt   30  41.945 ? 0.666  ns/op
AllocFromSliceTest.alloc_confined_slice      20  avgt   30  43.159 ? 1.294  ns/op
AllocFromSliceTest.alloc_confined_slice     100  avgt   30  42.603 ? 0.439  ns/op
AllocFromSliceTest.alloc_confined_slice     500  avgt   30  46.771 ? 1.099  ns/op
AllocFromSliceTest.alloc_confined_slice    1000  avgt   30  58.050 ? 1.484  ns/op


As it can be seen, absence of any memory zeroing helps the numbers to remain more "flat" with growing sizes.

-------------

PR Comment: https://git.openjdk.org/panama-foreign/pull/878#issuecomment-1708110712


More information about the panama-dev mailing list