[foreign-memaccess+abi] RFR: 8315769: Add support for sliced allocation
Maurizio Cimadamore
mcimadamore at openjdk.org
Wed Sep 6 11:14:36 UTC 2023
On Wed, 6 Sep 2023 10:41:18 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
> This PR adds a new method in `SegmentAllocator`:
>
>
> default MemorySegment allocateFrom(ValueLayout elementLayout, MemorySegment source,
> ValueLayout sourceElementLayout, long sourceOffset, long srcElementCount) {
>
>
> This method allows clients to allocate a new memory segment and copy the contents of a portion of an existing segment into the newly allocated region of memory. As such it can be used to address the following use cases:
>
> * allocate from a `ByteBuffer`
> * allocate from another memory segment
> * allocate from a Java array slice
>
> All these cases were not covered by the existing API points, which meant that developers had to use a more general allocation request (such as `allocate(long, long)`) and then pay a performance cost (because of memory zeroing). In other words, the new method in this PR completes the allocation API, by providing a flexible way to allocate a new segment from an existing source (another segment) with given offset and length.
>
> Given that the new method is more general than the existing array-accepting `allocateFrom`, this PR rewires the existing array-accepting method to be rewritten on top of the new overload (and tweaks the javadoc of such methods accordingly).
>
> One detail to note is that the new method takes _two_ element layouts - one is the layout of the newly allocated segment, whereas the other is the layout of the source segment. Such layouts must have same alignment and same carrier - but they can have different endianness (in which case a bulk copy with swap is performed). This is not too different from the most general `MemorySegment::copy` static overload.
Here's the benchmark results I got:
Benchmark (size) Mode Cnt Score Error Units
AllocFromSliceTest.alloc_confined 5 avgt 30 69.853 ? 1.834 ns/op
AllocFromSliceTest.alloc_confined 20 avgt 30 65.872 ? 3.113 ns/op
AllocFromSliceTest.alloc_confined 100 avgt 30 61.102 ? 2.832 ns/op
AllocFromSliceTest.alloc_confined 500 avgt 30 68.474 ? 1.573 ns/op
AllocFromSliceTest.alloc_confined 1000 avgt 30 82.487 ? 1.597 ns/op
AllocFromSliceTest.alloc_confined_slice 5 avgt 30 41.945 ? 0.666 ns/op
AllocFromSliceTest.alloc_confined_slice 20 avgt 30 43.159 ? 1.294 ns/op
AllocFromSliceTest.alloc_confined_slice 100 avgt 30 42.603 ? 0.439 ns/op
AllocFromSliceTest.alloc_confined_slice 500 avgt 30 46.771 ? 1.099 ns/op
AllocFromSliceTest.alloc_confined_slice 1000 avgt 30 58.050 ? 1.484 ns/op
As it can be seen, absence of any memory zeroing helps the numbers to remain more "flat" with growing sizes.
-------------
PR Comment: https://git.openjdk.org/panama-foreign/pull/878#issuecomment-1708110712
More information about the panama-dev
mailing list