RFR: 8350285: Regression caused by ShenandoahLock under extreme contention

Xiaolong Peng xpeng at openjdk.org
Wed Feb 19 20:58:05 UTC 2025


On Wed, 19 Feb 2025 16:56:14 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison:
>> 
>> Tip:
>> 
>> 94069776
>> 50993550
>> 49321667
>> 33903446
>> 32291313
>> 30587810
>> 27759958
>> 25080997
>> 24657404
>> 23874338
>> 
>> Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570)
>> 
>> 58428998
>> 44410618
>> 30788370
>> 20636942
>> 15986465
>> 15307468
>> 9686426
>> 9432094
>> 7473938
>> 6854014
>> 
>> Note: command line for the test:
>> 
>> java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing  -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr
>> 
>> 
>> With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). 
>> With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR:
>> 
>> 
>> 1890706
>> 1222180
>> 1042758
>> 853157
>> 792057
>> 785697
>> 780627
>> 757817
>> 740607
>> 736646
>> 725727
>> 725596
>> 724106
>> 
>> 
>> ### Other test
>> - [x] `make test TEST=hotspot_gc_shenandoah`
>> - [x] Tier 2
>
> src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 71:
> 
>> 69:         while (SafepointSynchronize::is_synchronizing() &&
>> 70:                !SafepointMechanism::local_poll_armed(java_thread)) {
>> 71:           short_sleep();
> 
> Why not `yield_or_sleep` here?

It can be `yield_or_sleep` here now, I'll rerun a test to verify, I have updated `yield_or_sleep` to reset the counter after `short_sleep`.

In our test last year, we noticed Java tread may run this loop over 20k times in worse case, the older version `yields` counter won't be reset, so it is possible after safepoint some Java thread will only do `short_sleep`, which may increase allocation latency, I don't want waiting on safepoint poll impact the allocation path after safepoint.

> src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 48:
> 
>> 46:   void yield_or_sleep(int &yields) {
>> 47:     if (yields < 5) {
>> 48:       os::naked_yield();
> 
> Need `#include "runtime/os.hpp"` for this. There is likely a transitive dependency now, but it is cleaner to depend explicitly. Or, maybe move this definition to `shenandoahLock.cpp`, it would be even cleaner then, I think.

Thanks, just moved the definition to `shenandoahLock.cpp`, it can be static as well.

> src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 61:
> 
>> 59: #else
>> 60:     os::naked_short_nanosleep(100000);
>> 61: #endif
> 
> Any context where this is coming from? This looks like from `SpinYield`? If so, should we target `SpinYield::default_sleep_ns=1000`?

It is just a magic number we tested, not from `SpinYield`. I chose 100us because Shenandoah GC pause is usually less then 1ms, I also tested 10us but 100us was better in the test.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962088642
PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962243559
PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962094983


More information about the hotspot-gc-dev mailing list