[jdk21u-dev] RFR: 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention

Daniel Huang duke at openjdk.org
Wed Jul 2 17:40:44 UTC 2025


On Mon, 30 Jun 2025 21:03:48 GMT, Daniel Huang <duke at openjdk.org> wrote:

> Backport for ShenandoahLock performance regression issue. The fix involves sleeping for a very short duration every 3 yields, with the number of yields picked through manual testing.
> 
> Clean backport, ran GHA sanity checks and locally tested `tier1`, `tier2`, and `hotspot_gc_shenandoah`. `test/jdk/java/nio/channels/FileChannel/directio/DirectIOTest.java` sometimes fails locally, but it also sometimes failed before the backport.
> `test/jdk/java/nio/channels/DatagramChannel/SendReceiveMaxSize.java` fails locally, but it also fails locally before the backport.

Sure!

The original fix had code to reproduce the bug:

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Semaphore;

public class Alloc {
    static final CountDownLatch startSignal = new CountDownLatch(1);
    static final Semaphore semaphore = new Semaphore(128);
    static final int THREADS = 1024; //64 threads per CPU core, 16 cores
    static final Object[] sinks = new Object[64 * THREADS];
    static volatile boolean start;
    static volatile boolean stop;

    private static void waitOnStartSignal() {
        try {
            startSignal.await();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String... args) throws Throwable {
        for (int t = 0; t < THREADS; t++) {
            int ft = t;
            new Thread(() -> work(ft * 64)).start();
        }

        Thread.sleep(1000);
        startSignal.countDown();
        Thread.sleep(30_000);
        stop = true;
    }

    public static void work(int idx) {
        waitOnStartSignal();
        while (!stop) {
            semaphore.acquireUninterruptibly();
            try {
                sinks[idx] = new byte[128];
            } catch (Throwable ex) {
                throw new RuntimeException(ex);
            } finally {
                semaphore.release();
            }
        }
    }
}


I ran this on the command line with
`.build/linux-x86_64-server-release/jdk/bin/java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr`

Running this without the fix gives at-safepoint times

22273444
11615507
11297887
10424031
10117190
9789552
9754920
9599965
7477300
6897913


Running with the backported fix gives at-safepoint times

15667088
8279113
3800276
853206
464314
399752
387322
381562
378641
358231

-------------

PR Comment: https://git.openjdk.org/jdk21u-dev/pull/1933#issuecomment-3028728951


More information about the jdk-updates-dev mailing list