RFR: 8347408: Create an internal method handle adapter for system calls with errno [v14]
Maurizio Cimadamore
mcimadamore at openjdk.org
Wed Jan 15 16:19:42 UTC 2025
On Wed, 15 Jan 2025 16:09:36 GMT, Per Minborg <pminborg at openjdk.org> wrote:
>> Going forward, converting older JDK code to use the relatively new FFM API requires system calls that can provide `errno` and the likes to explicitly allocate a MemorySegment to capture potential error states. This can lead to negative performance implications if not designed carefully and also introduces unnecessary code complexity.
>>
>> Hence, this PR proposes to add a _JDK internal_ method handle adapter that can be used to handle system calls with `errno`, `GetLastError`, and `WSAGetLastError`.
>>
>> It currently relies on a thread-local cache of MemorySegments to allide allocations. If, in the future, a more efficient thread-associated allocation scheme becomes available, we could easily migrate to that one.
>>
>> Here are some benchmarks:
>>
>>
>> Benchmark Mode Cnt Score Error Units
>> CaptureStateUtilBench.explicitAllocationFail avgt 30 41.615 ? 1.203 ns/op
>> CaptureStateUtilBench.explicitAllocationSuccess avgt 30 23.094 ? 0.580 ns/op
>> CaptureStateUtilBench.threadLocalFail avgt 30 14.760 ? 0.078 ns/op
>> CaptureStateUtilBench.threadLocalReuseSuccess avgt 30 7.189 ? 0.151 ns/op
>>
>>
>> Explicit allocation:
>>
>> try (var arena = Arena.ofConfined()) {
>> return (int) HANDLE.invoke(arena.allocate(4), 0, 0);
>> }
>>
>>
>> Thread Local (tl):
>>
>> return (int) ADAPTED_HANDLE.invoke(arena.allocate(4), 0, 0);
>>
>>
>> The graph below shows the difference in latency for a successful call:
>>
>> 
>>
>> This is a ~3x improvement for both the happy and the error path.
>>
>>
>> Tested and passed tiers 1-3.
>
> Per Minborg has updated the pull request incrementally with two additional commits since the last revision:
>
> - Use invokeExact semantics in the tests
> - Clean up
src/java.base/share/classes/jdk/internal/foreign/CaptureStateUtil.java line 282:
> 280: * use in the boostrap sequence.
> 281: */
> 282: private static final class SegmentCache {
This abstraction seems very useful, and... it also strikes me as generalizable? It's effectively a one-element cache, where there's some logic to initialize the cached element (which could be provided by a lambda). Then it's using a platform local under the hood and only using the cached element when it makes sense to do so (e.g. when there has not been a virtual thread switcharoo :-) ). In "unsafe" cases, we just compute the element using the user-provided lambda instead of using the cache. Am I dreaming?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1916963768
More information about the core-libs-dev
mailing list