RFR: 8347408: Create an internal method handle adapter for system calls with errno

Wed Feb 26 12:58:23 UTC 2025

As we advance, converting older JDK code to use the relatively new FFM API requires system calls that can provide `errno` and the likes to explicitly allocate a `MemorySegment` to capture potential error states. This can lead to negative performance implications if not designed carefully and also introduces unnecessary code complexity.

Hence, this PR proposes adding a JDK internal method handle adapter that can be used to handle system calls with `errno`, `GetLastError`, and `WSAGetLastError`.

It relies on an efficient carrier-thread-local cache of memory regions to allide allocations.

Here are some benchmarks that ran on a platform thread and virtual threads respectively (M1 Mac):

Benchmark                                                  Mode  Cnt   Score   Error  Units
CaptureStateUtilBench.OfVirtual.adaptedSysCallFail         avgt   30  24.330 ? 0.820  ns/op
CaptureStateUtilBench.OfVirtual.adaptedSysCallSuccess      avgt   30   8.257 ? 0.117  ns/op
CaptureStateUtilBench.OfVirtual.explicitAllocationFail     avgt   30  41.415 ? 1.013  ns/op
CaptureStateUtilBench.OfVirtual.explicitAllocationSuccess  avgt   30  21.720 ? 0.463  ns/op
CaptureStateUtilBench.OfVirtual.tlAllocationFail           avgt   30  23.636 ? 0.182  ns/op
CaptureStateUtilBench.OfVirtual.tlAllocationSuccess        avgt   30   8.234 ? 0.156  ns/op
CaptureStateUtilBench.adaptedSysCallFail                   avgt   30  23.918 ? 0.487  ns/op
CaptureStateUtilBench.adaptedSysCallSuccess                avgt   30   4.946 ? 0.089  ns/op
CaptureStateUtilBench.explicitAllocationFail               avgt   30  42.280 ? 1.128  ns/op
CaptureStateUtilBench.explicitAllocationSuccess            avgt   30  21.809 ? 0.413  ns/op
CaptureStateUtilBench.tlAllocationFail                     avgt   30  24.422 ? 0.673  ns/op
CaptureStateUtilBench.tlAllocationSuccess                  avgt   30   5.182 ? 0.152  ns/op

Adapted system call:

        return (int) ADAPTED_HANDLE.invoke(0, 0); // Uses a MH-internal pool
```        
Explicit allocation:

        try (var arena = Arena.ofConfined()) {
            return (int) HANDLE.invoke(arena.allocate(4), 0, 0);
        }
```        
Thread Local allocation:

        try (var arena = POOLS.take()) {
            return (int) HANDLE.invoke(arena.allocate(4), 0, 0); // Uses a manually specified pool
        }
```        
The adapted system call exhibits a ~4x performance improvement over the existing "explicit allocation" scheme for the happy path on platform threads. Because there needs to be sharing across threads for virtual-thread-capable carrier threads, these are a bit slower ("only" ~3x faster).

![image](https://github.com/user-attachments/assets/ae826d95-ae9b-4d46-a03b-d342e058169d)

Here are some benchmarks for the underlying ArenaPool (M1 Mac):

Benchmark                                   (ELEM_SIZE)  Mode  Cnt   Score    Error  Units
ArenaPoolBench.OfVirtual.confined                     4  avgt   30  23.543 ?  0.168  ns/op
ArenaPoolBench.OfVirtual.confined                    64  avgt   30  27.384 ?  0.167  ns/op
ArenaPoolBench.OfVirtual.confined2                    4  avgt   30  47.811 ?  0.220  ns/op
ArenaPoolBench.OfVirtual.confined2                   64  avgt   30  55.404 ?  0.286  ns/op
ArenaPoolBench.OfVirtual.pooled                       4  avgt   30   8.210 ?  0.043  ns/op
ArenaPoolBench.OfVirtual.pooled                      64  avgt   30  45.525 ? 52.525  ns/op
ArenaPoolBench.OfVirtual.pooled2                      4  avgt   30  50.670 ?  0.778  ns/op
ArenaPoolBench.OfVirtual.pooled2                     64  avgt   30  85.846 ?  2.304  ns/op
ArenaPoolBench.confined                               4  avgt   30  23.286 ?  0.184  ns/op
ArenaPoolBench.confined                              64  avgt   30  27.026 ?  0.111  ns/op
ArenaPoolBench.confined2                              4  avgt   30  48.301 ?  0.942  ns/op
ArenaPoolBench.confined2                             64  avgt   30  57.512 ?  5.373  ns/op
ArenaPoolBench.pooled                                 4  avgt   30   5.085 ?  0.048  ns/op
ArenaPoolBench.pooled                                64  avgt   30  29.621 ?  0.440  ns/op
ArenaPoolBench.pooled2                                4  avgt   30  10.610 ?  0.339  ns/op
ArenaPoolBench.pooled2                               64  avgt   30  60.815 ?  1.046  ns/op
ArenaPoolFromBench.OfVirtual.confinedInt            N/A  avgt   30  21.944 ?  0.122  ns/op
ArenaPoolFromBench.OfVirtual.confinedSting          N/A  avgt   30  26.190 ?  0.193  ns/op
ArenaPoolFromBench.OfVirtual.pooledInt              N/A  avgt   30   8.217 ?  0.043  ns/op
ArenaPoolFromBench.OfVirtual.pooledString           N/A  avgt   30   9.271 ?  0.056  ns/op
ArenaPoolFromBench.confinedInt                      N/A  avgt   30  21.892 ?  0.139  ns/op
ArenaPoolFromBench.confinedSting                    N/A  avgt   30  26.012 ?  0.058  ns/op
ArenaPoolFromBench.pooledInt                        N/A  avgt   30   5.056 ?  0.034  ns/op
ArenaPoolFromBench.pooledString                     N/A  avgt   30   6.419 ?  0.037  ns/op

Note: The pool size for the above benchmarks was 32 bytes.

This PR relates to https://github.com/openjdk/jdk/pull/23391 we had to back out. This PR attempts to ensure, that the problems encountered there do not surface in this PR.

The arena pool is able to share recyclable memory across several arenas, for platform threads.

This PR passes tier1, tier2, and tier3 testing.

-------------

Commit messages:
 - Use lazy initialization of method handles
 - Clean up visibility
 - Merge branch 'master' into errno-util3
 - Add @ForceInline annotations
 - Add out of order test for VTs
 - Allow memory reuse for several arenas
 - Remove file
 - Use more frequent allocations
 - Merge branch 'master' into errno-util3
 - Add unsafe variant
 - ... and 21 more: https://git.openjdk.org/jdk/compare/037e4711...907329e9

Changes: https://git.openjdk.org/jdk/pull/23765/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23765&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8347408
  Stats: 1902 lines in 13 files changed: 1891 ins; 2 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/23765.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23765/head:pull/23765

PR: https://git.openjdk.org/jdk/pull/23765