RFR: 8254693: Add Panama feature to pass heap segments to native code [v4]

Jorn Vernee jvernee at openjdk.org
Wed Oct 18 12:45:07 UTC 2023


On Wed, 18 Oct 2023 09:42:27 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call.
>> 
>> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive.
>> 
>> Components of this patch:
>> 
>> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`.
>> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed.
>> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified.
>> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures.
>> - The object/oop + offset is exposed as temporary address to native code.
>> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code).
>> - Only x64 and AArch64 for now.
>> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866
>> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`.
>> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical`
>> 
>> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode.
>> 
>> Numbers for the included benchmark on my machine are:
>> 
>> 
>> Benchmar...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Phrasing
>   
>   Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com>

Added another benchmark to the patch that xors 2 arrays together using various strategies. These are the results on my machine:


Benchmark         (arrayKind)  (sizeKind)  Mode  Cnt   Score    Error  Units
XorTest.xor      JNI_ELEMENTS       SMALL  avgt   30   0.555 �  0.010  ms/op
XorTest.xor      JNI_ELEMENTS      MEDIUM  avgt   30   4.610 �  0.114  ms/op
XorTest.xor      JNI_ELEMENTS       LARGE  avgt   30  53.533 �  2.113  ms/op
XorTest.xor        JNI_REGION       SMALL  avgt   30   0.030 �  0.001  ms/op
XorTest.xor        JNI_REGION      MEDIUM  avgt   30   1.498 �  0.041  ms/op
XorTest.xor        JNI_REGION       LARGE  avgt   30   7.544 �  0.188  ms/op
XorTest.xor      JNI_CRITICAL       SMALL  avgt   30   0.035 �  0.005  ms/op
XorTest.xor      JNI_CRITICAL      MEDIUM  avgt   30   0.496 �  0.003  ms/op
XorTest.xor      JNI_CRITICAL       LARGE  avgt   30   2.521 �  0.035  ms/op
XorTest.xor   FOREIGN_NO_INIT       SMALL  avgt   30   0.030 �  0.001  ms/op
XorTest.xor   FOREIGN_NO_INIT      MEDIUM  avgt   30   1.303 �  0.021  ms/op
XorTest.xor   FOREIGN_NO_INIT       LARGE  avgt   30   7.668 �  0.168  ms/op
XorTest.xor      FOREIGN_INIT       SMALL  avgt   30   0.031 �  0.001  ms/op
XorTest.xor      FOREIGN_INIT      MEDIUM  avgt   30   1.485 �  0.012  ms/op
XorTest.xor      FOREIGN_INIT       LARGE  avgt   30   9.183 �  0.247  ms/op
XorTest.xor  FOREIGN_CRITICAL       SMALL  avgt   30   0.026 �  0.001  ms/op
XorTest.xor  FOREIGN_CRITICAL      MEDIUM  avgt   30   0.501 �  0.002  ms/op
XorTest.xor  FOREIGN_CRITICAL       LARGE  avgt   30   2.578 �  0.023  ms/op
XorTest.xor            UNSAFE       SMALL  avgt   30   0.029 �  0.001  ms/op
XorTest.xor            UNSAFE      MEDIUM  avgt   30   1.300 �  0.013  ms/op
XorTest.xor            UNSAFE       LARGE  avgt   30   7.632 �  0.178  ms/op


The important part here is the `FOREIGN_CRITICAL` (the new feature) is on par with `JNI_CRITICAL`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768370164


More information about the core-libs-dev mailing list