RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations

Jorn Vernee jvernee at openjdk.org
Sun Jan 19 21:09:15 UTC 2025


On Wed, 15 Jan 2025 21:39:05 GMT, Matthias Ernst <duke at openjdk.org> wrote:

> Certain signatures for foreign function calls require allocation of an intermediate buffer to adapt the FFM's to the native stub's calling convention ("needsReturnBuffer"). In the current implementation, this buffer is malloced and freed on every FFM invocation, a non-negligible overhead.
> 
> Sample stack trace:
> 
>    java.lang.Thread.State: RUNNABLE
> 	at jdk.internal.misc.Unsafe.allocateMemory0(java.base at 25-ea/Native Method)
> 	at jdk.internal.misc.Unsafe.allocateMemory(java.base at 25-ea/Unsafe.java:636)
> 	at jdk.internal.foreign.SegmentFactories.allocateMemoryWrapper(java.base at 25-ea/SegmentFactories.java:215)
> 	at jdk.internal.foreign.SegmentFactories.allocateSegment(java.base at 25-ea/SegmentFactories.java:193)
> 	at jdk.internal.foreign.ArenaImpl.allocateNoInit(java.base at 25-ea/ArenaImpl.java:55)
> 	at jdk.internal.foreign.ArenaImpl.allocate(java.base at 25-ea/ArenaImpl.java:60)
> 	at jdk.internal.foreign.ArenaImpl.allocate(java.base at 25-ea/ArenaImpl.java:34)
> 	at java.lang.foreign.SegmentAllocator.allocate(java.base at 25-ea/SegmentAllocator.java:645)
> 	at jdk.internal.foreign.abi.SharedUtils$2.<init>(java.base at 25-ea/SharedUtils.java:388)
> 	at jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base at 25-ea/SharedUtils.java:386)
> 	at jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base at 25-ea/Unknown Source)
> 	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base at 25-ea/DirectMethodHandle$Holder)
> 	at java.lang.invoke.LambdaForm$MH/0x000001f00109a400.invoke(java.base at 25-ea/LambdaForm$MH)
> 	at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base at 25-ea/Invokers$Holder)
> 
> 
> When does this happen? A fairly easy way to trigger this is through returning a small aggregate like the following:
> 
> struct Vector2D {
>   double x, y;
> };
> Vector2D Origin() {
>   return {0, 0};
> }
> 
> 
> On AArch64, such a struct is returned in two 128 bit registers v0/v1.
> The VM's calling convention for the native stub consequently expects an 32 byte output segment argument.
> The FFM downcall method handle instead expects to create a 16 byte result segment through the application-provided SegmentAllocator, and needs to perform an appropriate adaptation, roughly like so:
> 
>   MemorySegment downcallMH(SegmentAllocator a) {
>     MemorySegment tmp = SharedUtils.allocate(32);
>     try {
>       nativeStub.invoke(tmp);  // leaves v0, v1 in tmp
>       MemorySegment result = a.allocate(16);
>       result.setDouble(0, tmp.getDouble(0));
>       result.setDouble(8, tmp.getDouble(16));
>       return result;
>    ...

Could you add the benchmark you're using to the PR as well? The benchmark should be put under `./test/micro/org/openjdk/bench/java/lang/foreign/`. This will allow others to reproduce the results, and longer term, having a benchmark on file would allow us to detect regressions/improvements in performance in the future as well.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23142#issuecomment-2598550989


More information about the core-libs-dev mailing list