[jdk21u-dev] RFR: 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention
Goetz Lindenmaier
goetz at openjdk.org
Fri Jul 4 10:25:43 UTC 2025
On Wed, 2 Jul 2025 17:37:42 GMT, Daniel Huang <duke at openjdk.org> wrote:
>> Backport for ShenandoahLock performance regression issue. The fix involves sleeping for a very short duration every 3 yields, with the number of yields picked through manual testing.
>>
>> Clean backport, ran GHA sanity checks and locally tested `tier1`, `tier2`, and `hotspot_gc_shenandoah`. `test/jdk/java/nio/channels/FileChannel/directio/DirectIOTest.java` sometimes fails locally, but it also sometimes failed before the backport.
>> `test/jdk/java/nio/channels/DatagramChannel/SendReceiveMaxSize.java` fails locally, but it also fails locally before the backport.
>
> Sure!
>
> The original fix had code to reproduce the bug:
>
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.Semaphore;
>
> public class Alloc {
> static final CountDownLatch startSignal = new CountDownLatch(1);
> static final Semaphore semaphore = new Semaphore(128);
> static final int THREADS = 1024; //64 threads per CPU core, 16 cores
> static final Object[] sinks = new Object[64 * THREADS];
> static volatile boolean start;
> static volatile boolean stop;
>
> private static void waitOnStartSignal() {
> try {
> startSignal.await();
> } catch (InterruptedException e) {
> throw new RuntimeException(e);
> }
> }
>
> public static void main(String... args) throws Throwable {
> for (int t = 0; t < THREADS; t++) {
> int ft = t;
> new Thread(() -> work(ft * 64)).start();
> }
>
> Thread.sleep(1000);
> startSignal.countDown();
> Thread.sleep(30_000);
> stop = true;
> }
>
> public static void work(int idx) {
> waitOnStartSignal();
> while (!stop) {
> semaphore.acquireUninterruptibly();
> try {
> sinks[idx] = new byte[128];
> } catch (Throwable ex) {
> throw new RuntimeException(ex);
> } finally {
> semaphore.release();
> }
> }
> }
> }
>
>
> I ran this on the command line with
> `.build/linux-x86_64-server-release/jdk/bin/java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr`
>
> Running this without the fix gives at-safepoint times
>
> 22273444
> 11615507
> 11297887
> 10424031
> 10117190
> 9789552
> 9754920
> 9599965
> 7477300
> 6897913
>
>
> Running with the backported fix gives at-safepoint times
>
> 15667088
> 8279113
> 3800276
> 853206
> 464314
> 399752
> 387322
> 381562
> 378641
> 358231
Hi @dtmhuang
well, that shows that the change does what it intends to do. That's not my point.
I want to make sure you cause no regressions of any other kind. jdk25, which you mention wrt. testing is not live yet.
I don't see why we should take any risk for a performance optimization that helps only in rare situations.
-------------
PR Comment: https://git.openjdk.org/jdk21u-dev/pull/1933#issuecomment-3035405536
More information about the jdk-updates-dev
mailing list