RFR: 8347408: Create an internal method handle adapter for system calls with errno [v16]

Thu Jan 16 15:21:40 UTC 2025

On Thu, 16 Jan 2025 11:58:20 GMT, Per Minborg <pminborg at openjdk.org> wrote:

>> Going forward, converting older JDK code to use the relatively new FFM API requires system calls that can provide `errno` and the likes to explicitly allocate a MemorySegment to capture potential error states. This can lead to negative performance implications if not designed carefully and also introduces unnecessary code complexity.
>> 
>> Hence, this PR proposes to add a _JDK internal_ method handle adapter that can be used to handle system calls with `errno`, `GetLastError`, and `WSAGetLastError`.
>> 
>> It currently relies on a thread-local cache of MemorySegments to allide allocations. If, in the future, a more efficient thread-associated allocation scheme becomes available, we could easily migrate to that one.
>> 
>> Here are some benchmarks:
>> 
>> 
>> Benchmark                                        Mode  Cnt   Score   Error  Units
>> CaptureStateUtilBench.explicitAllocationFail     avgt   30  41.615 ? 1.203  ns/op
>> CaptureStateUtilBench.explicitAllocationSuccess  avgt   30  23.094 ? 0.580  ns/op
>> CaptureStateUtilBench.threadLocalFail            avgt   30  14.760 ? 0.078  ns/op
>> CaptureStateUtilBench.threadLocalReuseSuccess    avgt   30   7.189 ? 0.151  ns/op
>> 
>> 
>> Explicit allocation:
>> 
>>         try (var arena = Arena.ofConfined()) {
>>             return (int) HANDLE.invoke(arena.allocate(4), 0, 0);
>>         }
>> 
>> 
>> Thread Local (tl):
>> 
>>         return (int) ADAPTED_HANDLE.invoke(arena.allocate(4), 0, 0);
>> 
>> 
>> The graph below shows the difference in latency for a successful call:
>> 
>> ![image](https://github.com/user-attachments/assets/58fbef01-5d06-406c-87e6-75f468227fc6)
>> 
>> This is a ~3x improvement for both the happy and the error path.
>> 
>> 
>> Tested and passed tiers 1-3.
>
> Per Minborg has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove unused class

As mentioned in https://mail.openjdk.org/pipermail/panama-dev/2025-January/020882.html, I've been looking at pretty much exactly the same thing for another buffer required in the call sequence, with slightly different takeaways:
* the implementation in your PR always leaves ownership of the segments inside the thread-local. Borrowing/releasing threads need to check "can I take this",  "am I still on the right carrier, trying to return this to the right slot?".  I find it easier to actually remove the elements from the cache slot, and potentially return them to the cache slot of a different CT. Ownership and responsibilities seem clearer to me this way.
* I've built on the assumption that the "CTL.get.test/modify" sequences are actually atomic wrt virtual-thread-scheduling (seeing NIO code that seems to rely on the same) and no mounting/unmounting will happen inside them (but _may_ happen on the acquire-invoke-release bracket; this I have actually observed to be the case between invoke and release).  If rescheduling can happen inside these sequences and we have multiple threads contending for the CT-local element, I'm not 100% sure that there is a point to using CTL in the first place?

src/java.base/share/classes/jdk/internal/foreign/CaptureStateUtil.java line 333:

> 331:             SegmentPool cache = TL_POOLS.get();
> 332:             if (cache == null) {
> 333:                 TL_POOLS.set(cache = new SegmentPool());

Why not use `initialValue`?

src/java.base/share/classes/jdk/internal/util/SingleElementPool.java line 124:

> 122:         public void close() {
> 123:             if (UNSAFE.compareAndSetInt(this, CLOSED_OFFSET, FALSE, TRUE)) {
> 124:                 if (UNSAFE.getIntVolatile(this, POOLED_ELEMENT_TAKEN_OFFSET) == FALSE) {

If we follow the premise that "any scheduling can happen", then this `close` could be racing with another thread trying to borrow the pooled element of this already-dead carrier thread:

VT at CT1:
   pool = tl.get()
   unmount
CT1 dies =>
  pool1.close
  taken == false =>  recycle
VT at CT2
  mount
  pool.take() succeeds since `taken` still false
  BOOM

I believe the close action would need to "take" the element before recycling it.
Wouldn't this obviate the need for a separate "closed" slot?

Again, I'm not sure whether this scheduling is actually possible at this time.

src/java.base/share/classes/jdk/internal/util/SingleElementPool.java line 132:

> 130:         @ForceInline
> 131:         private boolean takePooledElement() {
> 132:             return UNSAFE.getAndSetInt(this, POOLED_ELEMENT_TAKEN_OFFSET, TRUE) == FALSE;

Does this need to use atomics? Looking at similar code in sun.nio.ch.Util.BufferCache which seems to be working without. My interpretation is that BufferCache relies on a virtual thread to not get preempted inside the { CTL.get().takeElement } sequence? Would the same reasoning not work here?

I.e.

  element = carrierThreadLocal.get() [1]
  if (element.x)                     [2]   a) can this virtual thread be moved to another carrier on [2] or [3]
    element.y                        [3]   b) can this virtual thread be suspended and another one schedule between [1] and [2]/[3]?

My understanding is that in the current implementation of virtual threads, this does not happen and we can manage the cache element with straight non-atomic code (correct)? If not correct, how does the BufferCache manage (pinned somewhere?)?

src/java.base/share/classes/jdk/internal/util/SingleElementPool.java line 137:

> 135:         @ForceInline
> 136:         private void releasePooledElement() {
> 137:             UNSAFE.putIntVolatile(this, POOLED_ELEMENT_TAKEN_OFFSET, FALSE);

Similar to above: the use of volatile suggests that you're concerned about a virtual thread switch. Would be good to have clarification on this.

-------------

PR Review: https://git.openjdk.org/jdk/pull/22391#pullrequestreview-2556247903
PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1918610681
PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1918706463
PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1918637436
PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1918648090