RFR: 8254693: Add Panama feature to pass heap segments to native code [v4]
Jorn Vernee
jvernee at openjdk.org
Wed Oct 18 12:45:07 UTC 2023
On Wed, 18 Oct 2023 09:42:27 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:
>> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call.
>>
>> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive.
>>
>> Components of this patch:
>>
>> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`.
>> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed.
>> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified.
>> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures.
>> - The object/oop + offset is exposed as temporary address to native code.
>> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code).
>> - Only x64 and AArch64 for now.
>> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866
>> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`.
>> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical`
>>
>> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode.
>>
>> Numbers for the included benchmark on my machine are:
>>
>>
>> Benchmar...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
>
> Phrasing
>
> Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com>
Added another benchmark to the patch that xors 2 arrays together using various strategies. These are the results on my machine:
Benchmark (arrayKind) (sizeKind) Mode Cnt Score Error Units
XorTest.xor JNI_ELEMENTS SMALL avgt 30 0.555 � 0.010 ms/op
XorTest.xor JNI_ELEMENTS MEDIUM avgt 30 4.610 � 0.114 ms/op
XorTest.xor JNI_ELEMENTS LARGE avgt 30 53.533 � 2.113 ms/op
XorTest.xor JNI_REGION SMALL avgt 30 0.030 � 0.001 ms/op
XorTest.xor JNI_REGION MEDIUM avgt 30 1.498 � 0.041 ms/op
XorTest.xor JNI_REGION LARGE avgt 30 7.544 � 0.188 ms/op
XorTest.xor JNI_CRITICAL SMALL avgt 30 0.035 � 0.005 ms/op
XorTest.xor JNI_CRITICAL MEDIUM avgt 30 0.496 � 0.003 ms/op
XorTest.xor JNI_CRITICAL LARGE avgt 30 2.521 � 0.035 ms/op
XorTest.xor FOREIGN_NO_INIT SMALL avgt 30 0.030 � 0.001 ms/op
XorTest.xor FOREIGN_NO_INIT MEDIUM avgt 30 1.303 � 0.021 ms/op
XorTest.xor FOREIGN_NO_INIT LARGE avgt 30 7.668 � 0.168 ms/op
XorTest.xor FOREIGN_INIT SMALL avgt 30 0.031 � 0.001 ms/op
XorTest.xor FOREIGN_INIT MEDIUM avgt 30 1.485 � 0.012 ms/op
XorTest.xor FOREIGN_INIT LARGE avgt 30 9.183 � 0.247 ms/op
XorTest.xor FOREIGN_CRITICAL SMALL avgt 30 0.026 � 0.001 ms/op
XorTest.xor FOREIGN_CRITICAL MEDIUM avgt 30 0.501 � 0.002 ms/op
XorTest.xor FOREIGN_CRITICAL LARGE avgt 30 2.578 � 0.023 ms/op
XorTest.xor UNSAFE SMALL avgt 30 0.029 � 0.001 ms/op
XorTest.xor UNSAFE MEDIUM avgt 30 1.300 � 0.013 ms/op
XorTest.xor UNSAFE LARGE avgt 30 7.632 � 0.178 ms/op
The important part here is the `FOREIGN_CRITICAL` (the new feature) is on par with `JNI_CRITICAL`.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768370164
More information about the core-libs-dev
mailing list