RFR: 8347408: Create an internal method handle adapter for system calls with errno [v14]

Wed Jan 15 16:19:42 UTC 2025

On Wed, 15 Jan 2025 16:09:36 GMT, Per Minborg <pminborg at openjdk.org> wrote:

>> Going forward, converting older JDK code to use the relatively new FFM API requires system calls that can provide `errno` and the likes to explicitly allocate a MemorySegment to capture potential error states. This can lead to negative performance implications if not designed carefully and also introduces unnecessary code complexity.
>> 
>> Hence, this PR proposes to add a _JDK internal_ method handle adapter that can be used to handle system calls with `errno`, `GetLastError`, and `WSAGetLastError`.
>> 
>> It currently relies on a thread-local cache of MemorySegments to allide allocations. If, in the future, a more efficient thread-associated allocation scheme becomes available, we could easily migrate to that one.
>> 
>> Here are some benchmarks:
>> 
>> 
>> Benchmark                                        Mode  Cnt   Score   Error  Units
>> CaptureStateUtilBench.explicitAllocationFail     avgt   30  41.615 ? 1.203  ns/op
>> CaptureStateUtilBench.explicitAllocationSuccess  avgt   30  23.094 ? 0.580  ns/op
>> CaptureStateUtilBench.threadLocalFail            avgt   30  14.760 ? 0.078  ns/op
>> CaptureStateUtilBench.threadLocalReuseSuccess    avgt   30   7.189 ? 0.151  ns/op
>> 
>> 
>> Explicit allocation:
>> 
>>         try (var arena = Arena.ofConfined()) {
>>             return (int) HANDLE.invoke(arena.allocate(4), 0, 0);
>>         }
>> 
>> 
>> Thread Local (tl):
>> 
>>         return (int) ADAPTED_HANDLE.invoke(arena.allocate(4), 0, 0);
>> 
>> 
>> The graph below shows the difference in latency for a successful call:
>> 
>> ![image](https://github.com/user-attachments/assets/58fbef01-5d06-406c-87e6-75f468227fc6)
>> 
>> This is a ~3x improvement for both the happy and the error path.
>> 
>> 
>> Tested and passed tiers 1-3.
>
> Per Minborg has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Use invokeExact semantics in the tests
>  - Clean up

src/java.base/share/classes/jdk/internal/foreign/CaptureStateUtil.java line 282:

> 280:      * use in the boostrap sequence.
> 281:      */
> 282:     private static final class SegmentCache {

This abstraction seems very useful, and... it also strikes me as generalizable? It's effectively a one-element cache, where there's some logic to initialize the cached element (which could be provided by a lambda). Then it's using a platform local under the hood and only using the cached element when it makes sense to do so (e.g. when there has not been a virtual thread switcharoo :-) ). In "unsafe" cases, we just compute the element using the user-provided lambda instead of using the cache. Am I dreaming?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1916963768