RFR: 8338591: Improve performance of MemorySegment::copy
Maurizio Cimadamore
mcimadamore at openjdk.org
Tue Sep 3 15:26:22 UTC 2024
On Tue, 3 Sep 2024 15:22:36 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
>> This PR proposes to handle smaller FFM copy operations with Java code rather than transitioning to native code. This will improve performance. In this PR, copy operations involving zero to 63 bytes will be handled by Java code.
>>
>> Here is what it looks like for Windows x64:
>>
>> 
>>
>> Here is another chart for Linux a64:
>>
>> 
>>
>> Other platforms exhibit similar behavior. It should be noted that the gain with this PR is pronounced for certain common sizes that are more likely to appear in code (e.g. 8, 16, 24, and 32)
>>
>> It would be possible to use the same code path for the 7arg `MemorySegment::copy` method if it is similar to:
>>
>>
>> MemorySegment.copy(heapSrcSegment, JAVA_BYTE, 0, heapDstSegment, JAVA_BYTE, 0, ELEM_SIZE);
>>
>>
>> This could be added in a separate PR.
>>
>> This PR has been tested with tier1-3 and passed.
>
> src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java line 637:
>
>> 635: for (; offset < limit; offset += 8) {
>> 636: final long v = SCOPED_MEMORY_ACCESS.getLong(src.sessionImpl(), src.unsafeGetBase(), src.unsafeGetOffset() + srcOffset + offset);
>> 637: SCOPED_MEMORY_ACCESS.putLong(dst.sessionImpl(), dst.unsafeGetBase(), dst.unsafeGetOffset() + dstOffset + offset, v);
>
> I've been digging a bit deeper on why we can get away w/o aligned access (both here and for `fill`). It turns out that, conveniently, var handles use `Unsafe::getXYZUnaligned` for plain set/get. This means that the intrinsics will deal with unaligned access accordingly, depending on the platform (either by splitting into multiple accesses, or by issuing an unaligned access on platforms that support that).
>
> This means we can write "simple" code, and expect C2 to rewrite it the most optimal way. We should probably add a comment in this direction (both for `fill` and `copy`.
Separately, I wonder if it would make sense to put all these "tricky" routines off to one side - e.g. `BulkSegmentSupport`, so that we don't make `AbstractMemorySegmentImpl` even bigger than it is.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20829#discussion_r1742262076
More information about the core-libs-dev
mailing list