RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v3]
Jorn Vernee
jvernee at openjdk.org
Mon Jan 20 17:27:40 UTC 2025
On Mon, 20 Jan 2025 16:20:20 GMT, Matthias Ernst <duke at openjdk.org> wrote:
>> Certain signatures for foreign function calls (e.g. HVA return by value) require allocation of an intermediate buffer to adapt the FFM's to the native stub's calling convention. In the current implementation, this buffer is malloced and freed on every FFM invocation, a non-negligible overhead.
>>
>> Sample stack trace:
>>
>> java.lang.Thread.State: RUNNABLE
>> at jdk.internal.misc.Unsafe.allocateMemory0(java.base at 25-ea/Native Method)
>> ...
>> at jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base at 25-ea/SharedUtils.java:386)
>> at jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base at 25-ea/Unknown Source)
>> ...
>> at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base at 25-ea/Invokers$Holder)
>>
>>
>> To alleviate this, this PR remembers and reuses up to two small intermediate buffers per carrier-thread in subsequent calls.
>>
>> Performance (MBA M3):
>>
>>
>> master at 764d70b7df18e288582e616c62b0d7078f1ff3aa
>> Benchmark Mode Cnt Score Error Units
>> PointsAlloc.circle_by_ptr avgt 30 9.197 ? 0.037 ns/op
>> PointsAlloc.circle_by_value avgt 30 42.195 ? 0.088 ns/op <= #######
>> PointsAlloc.jni_ByteBuffer_alloc avgt 30 226.127 ? 35.378 ns/op
>> PointsAlloc.jni_long_alloc avgt 30 25.297 ? 2.457 ns/op
>> PointsAlloc.panama_alloc avgt 30 27.053 ? 1.915 ns/op
>>
>> After:
>> Benchmark Mode Cnt Score Error Units
>> PointsAlloc.circle_by_ptr avgt 30 9.156 ? 0.021 ns/op
>> PointsAlloc.circle_by_value avgt 30 11.995 ? 0.051 ns/op <= #######
>> PointsAlloc.jni_ByteBuffer_alloc avgt 30 211.161 ? 23.284 ns/op
>> PointsAlloc.jni_long_alloc avgt 30 24.885 ? 2.461 ns/op
>> PointsAlloc.panama_alloc avgt 30 26.905 ? 1.935 ns/op
>>
>>
>> `-prof gc` also shows that the new call path is fully scalar-replaced vs 160 byte/call before.
>
> Matthias Ernst has updated the pull request incrementally with three additional commits since the last revision:
>
> - shift api boundary
> - move bench
> - revert formatting
src/java.base/share/classes/jdk/internal/foreign/abi/SharedUtils.java line 393:
> 391: MemorySegment unscoped = CallBufferCache.acquireOrAllocate(size);
> 392: Arena scope = Arena.ofConfined();
> 393: MemorySegment source = unscoped.reinterpret(scope, null);
I suggest passing the `scope` to `acquireOrAllocate` as well, so we only need a single call to `reinterpret` (it has an overload that takes both a size and arena. (And I think you can then also remove the `@SuppressWarnings` from this method)
src/java.base/share/classes/jdk/internal/foreign/abi/SharedUtils.java line 395:
> 393: MemorySegment source = unscoped.reinterpret(scope, null);
> 394: // Preferable we'd like to register this cleanup in the line above
> 395: // but it breaks scalar replacement.
I have a fix for this issue. Will create a PR.
test/micro/org/openjdk/bench/java/lang/foreign/CallOverheadByValue.java line 54:
> 52: @State(org.openjdk.jmh.annotations.Scope.Thread)
> 53: @OutputTimeUnit(TimeUnit.NANOSECONDS)
> 54: @Fork(value = 1, jvmArgs = {"-Xlog:gc", "--enable-native-access=ALL-UNNAMED", "-Djava.library.path=micro/native"})
FWIW, unfortunately there is no builtin support to run profilers through `make`. I personally have a separate script that runs the benchmarks.jar:
./build/$CONF/images/jdk/bin/java -jar ./build/$CONF/images/test/micro/benchmarks.jar -prof gc AllocTest.alloc_confined
(where `$CONF` is your build configuration)
test/micro/org/openjdk/bench/java/lang/foreign/CallOverheadByValue.java line 98:
> 96: (SegmentAllocator) (_, _) -> dest,
> 97: phi);
> 98: }
Would it be viable to measure just a single invocation? That way we can remove other things like the `asSlice` call, which might introduce noise.
test/micro/org/openjdk/bench/java/lang/foreign/CallOverheadByValue.java line 99:
> 97: phi);
> 98: }
> 99: return points;
I don't think `points` needs to be returned here?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1922701981
PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1922701268
PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1922699935
PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1922700908
PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1922698930
More information about the core-libs-dev
mailing list