RFR: 8318986: Improve GenericWaitBarrier performance [v2]

Aleksey Shipilev shade at openjdk.org
Thu Nov 2 11:03:16 UTC 2023


> See the symptoms, reproducer and analysis in the bug.
> 
> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. 
> 
> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave.
> 
> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.)
> 
> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the two orders of magnitude better safepoint times, but also the >2x more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is at least 2x more, since we don't waste time at this wait barrier.
> 
> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/28cf22d3-b5ca-44fb-bde7-47189d14b47b)
> 
> Additional testing:
>   - [x] MacOS AArch64 server fastdebug, `tier1`
>   - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly)
>   - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly)
>   - [x] MacOS AArch64 server fastdebug, `tier2 tier3`
>   - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly)
>   - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly)

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Tigthen up memory ordering even more conservatively

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/16404/files
  - new: https://git.openjdk.org/jdk/pull/16404/files/a3906108..ca88eb74

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=00-01

  Stats: 17 lines in 1 file changed: 10 ins; 3 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/16404.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16404/head:pull/16404

PR: https://git.openjdk.org/jdk/pull/16404


More information about the hotspot-dev mailing list