RFR: 8350285: Regression caused by ShenandoahLock under extreme contention

Aleksey Shipilev shade at openjdk.org
Wed Feb 19 20:58:05 UTC 2025


On Wed, 19 Feb 2025 15:58:01 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison:
> 
> Tip:
> 
> 94069776
> 50993550
> 49321667
> 33903446
> 32291313
> 30587810
> 27759958
> 25080997
> 24657404
> 23874338
> 
> Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570)
> 
> 58428998
> 44410618
> 30788370
> 20636942
> 15986465
> 15307468
> 9686426
> 9432094
> 7473938
> 6854014
> 
> Note: command line for the test:
> 
> java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing  -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr
> 
> 
> With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). 
> With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR:
> 
> 
> 1890706
> 1222180
> 1042758
> 853157
> 792057
> 785697
> 780627
> 757817
> 740607
> 736646
> 725727
> 725596
> 724106
> 
> 
> ### Other test
> - [x] `make test TEST=hotspot_gc_shenandoah`
> - [x] Tier 2

All right, assuming performance results are good. Consider a minor nit:

src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 71:

> 69:         while (SafepointSynchronize::is_synchronizing() &&
> 70:                !SafepointMechanism::local_poll_armed(java_thread)) {
> 71:           short_sleep();

Why not `yield_or_sleep` here?

src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 48:

> 46:   void yield_or_sleep(int &yields) {
> 47:     if (yields < 5) {
> 48:       os::naked_yield();

Need `#include "runtime/os.hpp"` for this. There is likely a transitive dependency now, but it is cleaner to depend explicitly. Or, maybe move this definition to `shenandoahLock.cpp`, it would be even cleaner then, I think.

src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 61:

> 59: #else
> 60:     os::naked_short_nanosleep(100000);
> 61: #endif

Any context where this is coming from? This looks like from `SpinYield`? If so, should we target `SpinYield::default_sleep_ns=1000`?

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/23701#pullrequestreview-2627707591
PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962048735
PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962211656
PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962050750


More information about the hotspot-gc-dev mailing list