RFR: 8318986: Improve GenericWaitBarrier performance

Thu Nov 2 21:00:18 UTC 2023

On Thu, 2 Nov 2023 08:19:35 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> @robehn, you might be interested in this :)

Yepp, thanks for pinging me in! I'll have a look as when you are ready!

Regarding the bug, accounting for context switches in every 'sub-state' usually gets you at least one time.
That is what the: Atomic::add(&_barrier_threads, 1);, did, "no marine left behind" :) 

And while on this topic:
Note that there is an optimization that can be done on Linux also.
Waking all threads via futex the VM thread gets high runtime, so two safepoint very close to each other is slow.
Meaning if there are a new safepoint op depending, VM thread often will context switched out just after waking all threads.
Secondly the futex wake, involving visiting all run queues, can be parallelized similar to this by just waking a few thread and let them wake the rest. Intel actually did a draft of that just before meltdown/spectre, so it got lost.
I think that draft did wake like 6 or 8 at the time, one large system ~128 cores you could get the time to full utilization down by almost 50% (but you lose some latency for the JavaThreads doing the second round of wakening).

I mention this since you have setup measurements and graphs, so maybe you like to continue on this code :) (no jira issue for this)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1790668558