RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v13]

Wed Nov 10 18:13:38 UTC 2021

On Mon, 1 Nov 2021 13:11:40 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This test is too artificial. Going through my records I've found I have a microbenchmark for `java.util.concurrent. SynchronousQueue` which shows good improvements on jdk11. `SynchronousQueue` uses `onSpinWait`. Since jdk17 `SynchronousQueue` has not been using `onSpinWait` any more (See https://bugs.openjdk.java.net/browse/JDK-8267502). Maybe I can come up with a microbenchmark based on `SynchronousQueue` [code](https://github.com/openjdk/jdk11u-dev/blob/master/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java#L412):
>> 
>>         SNode awaitFulfill(SNode s, boolean timed, long nanos) {
>>             /*
>>              * When a node/thread is about to block, it sets its waiter
>>              * field and then rechecks state at least one more time
>>              * before actually parking, thus covering race vs
>>              * fulfiller noticing that waiter is non-null so should be
>>              * woken.
>>              *
>>              * When invoked by nodes that appear at the point of call
>>              * to be at the head of the stack, calls to park are
>>              * preceded by spins to avoid blocking when producers and
>>              * consumers are arriving very close in time.  This can
>>              * happen enough to bother only on multiprocessors.
>>              *
>>              * The order of checks for returning out of main loop
>>              * reflects fact that interrupts have precedence over
>>              * normal returns, which have precedence over
>>              * timeouts. (So, on timeout, one last check for match is
>>              * done before giving up.) Except that calls from untimed
>>              * SynchronousQueue.{poll/offer} don't check interrupts
>>              * and don't wait at all, so are trapped in transfer
>>              * method rather than calling awaitFulfill.
>>              */
>>             final long deadline = timed ? System.nanoTime() + nanos : 0L;
>>             Thread w = Thread.currentThread();
>>             int spins = shouldSpin(s)
>>                 ? (timed ? MAX_TIMED_SPINS : MAX_UNTIMED_SPINS)
>>                 : 0;
>>             for (;;) {
>>                 if (w.isInterrupted())
>>                     s.tryCancel();
>>                 SNode m = s.match;
>>                 if (m != null)
>>                     return m;
>>                 if (timed) {
>>                     nanos = deadline - System.nanoTime();
>>                     if (nanos <= 0L) {
>>                         s.tryCancel();
>>                         continue;
>>                     }
>>                 }
>>                 if (spins > 0) {
>>                     Thread.onSpinWait();
>>                     spins = shouldSpin(s) ? (spins - 1) : 0;
>>                 }
>>                 else if (s.waiter == null)
>>                     s.waiter = w; // establish waiter so can park next iter
>>                 else if (!timed)
>>                     LockSupport.park(this);
>>                 else if (nanos > SPIN_FOR_TIMEOUT_THRESHOLD)
>>                     LockSupport.parkNanos(this, nanos);
>>             }
>>         }
>> 
>> 
>> I've created https://bugs.openjdk.java.net/browse/JDK-8275728 to write such a microbenchmark.
>
> I suggest you do https://bugs.openjdk.java.net/browse/JDK-8275728 before you commit this. A benchmark which proves that this patch has some utility is needed, isn't it?

Hi Andrew (@theRealAph),
I've created a PR: https://github.com/openjdk/jdk/pull/6338 with a microbenchmark.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5562