RFR: 8318986: Improve GenericWaitBarrier performance [v8]

Wed Nov 22 15:11:13 UTC 2023

On Tue, 21 Nov 2023 10:35:14 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> See the symptoms, reproducer and analysis in the bug.
>> 
>> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count.
>> 
>> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.)
>> 
>> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac, see the graph below. The new version gives **orders of magnitude** better safepoint times. This also translates to much more active GC and attainable allocating rate, because GC throughput is not blocked by overly long safepoints.
>> 
>> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7)
>> 
>> Additional testing:
>>   - [x] MacOS AArch64 server fastdebug, `tier1`
>>   - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly)
>>   - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly)
>>   - [x] MacOS AArch64 server fastdebug, `tier2 tier3`
>>   - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly)
>>   - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly)
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8318986-generic-wait-barrier
>  - Do not SpinYield at disarm loop
>  - Merge branch 'master' into JDK-8318986-generic-wait-barrier
>  - Drop the Linux check in preparation for integration
>  - Merge branch 'master' into JDK-8318986-generic-wait-barrier
>  - Merge branch 'master' into JDK-8318986-generic-wait-barrier
>  - Rework paddings
>  - Encode barrier tag into state, resolving another race condition
>  - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods
>  - Merge branch 'master' into JDK-8318986-generic-wait-barrier
>  - ... and 6 more: https://git.openjdk.org/jdk/compare/20538340...e56a2bfa

I run Tiers[1-7] and there is one failure in tier5 in test vmTestbase/nsk/monitoring/stress/thread/strace016/TestDescription.java on windows-x64-debug. The output is:

#>  
#>  WARNING: switching log to verbose mode,
#>      because error is complained
#>  
ThreadMonitor> Test mode:	DIRECTLY access to MBean
ThreadController> number of created threads:	30
ThreadController> depth for all threads:	100
ThreadController> invocation type:	mixed

Starting threads.

ThreadController> locking threads

States of the threads are culminated.
# ERROR: 	Thread BLOCKED_ThreadMM001 wrong thread state: RUNNABLE
The following stacktrace is for failure analysis.
nsk.share.TestFailure:  Thread BLOCKED_ThreadMM001 wrong thread state: RUNNABLE
	at nsk.share.Log.logExceptionForFailureAnalysis(Log.java:431)
	at nsk.share.Log.complain(Log.java:402)
	at nsk.monitoring.stress.thread.strace010.runIt(strace010.java:148)
	at nsk.monitoring.stress.thread.strace010.run(strace010.java:99)
	at nsk.monitoring.stress.thread.strace010.main(strace010.java:95)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
	at java.base/java.lang.Thread.run(Thread.java:1570)

Checked 6 BLOCKED threads
# ERROR: Expected amount: 7 for BLOCKED threads actual: 6
Checked 7 WAITING threads
Checked 8 TIMED_WAITING threads
Checked 9 RUNNABLE threads
# ERROR: Expected amount: 8 for RUNNABLE threads actual: 9

Test FAILED

#>  
#>  SUMMARY: Following errors occured
#>      during test execution:
#>  
# ERROR: 	Thread BLOCKED_ThreadMM001 wrong thread state: RUNNABLE
# ERROR: Expected amount: 7 for BLOCKED threads actual: 6
# ERROR: Expected amount: 8 for RUNNABLE threads actual: 9

I re-run tier5 twice and the test alone 100 times but unfortunately couldn't reproduce the issue. I checked the history of failures and haven't seen this failed before. But it could also be that there is some race already in the test uncovered by this patch.

There are some jobs pending for macos-x64 (there is currently a bottleneck in the pipeline for this platform).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1822949463