[jdk17u-dev] RFR: 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention [v2]

Daniel Huang duke at openjdk.org
Wed Jul 2 17:57:45 UTC 2025


On Mon, 30 Jun 2025 16:28:27 GMT, Daniel Huang <duke at openjdk.org> wrote:

>> Backport for ShenandoahLock performance regression issue. The fix involves sleeping for a very short duration every 3 yields, with the number of yields picked through manual testing. 
>> 
>> Clean backport, ran GHA sanity checks and locally tested `tier1`, `tier2`, and `hotspot_gc_shenandoah`. `test/jdk/java/nio/channels/FileChannel/directio/DirectIOTest.java` sometimes fails locally, but it also sometimes failed before the backport.
>> `test/jdk/java/nio/channels/DatagramChannel/SendReceiveMaxSize.java` fails locally, but it also fails locally before the backport.
>
> Daniel Huang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into backport-JDK-8350285-shenandoahlock
>  - Merge branch 'openjdk:master' into backport-JDK-8350285-shenandoahlock
>  - Backport bd8ad309b59bceb3073a8d6411cca74e73508885

Ran an additional test:

The original fix had code to reproduce the bug:

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Semaphore;

public class Alloc {
    static final CountDownLatch startSignal = new CountDownLatch(1);
    static final Semaphore semaphore = new Semaphore(128);
    static final int THREADS = 1024; //64 threads per CPU core, 16 cores
    static final Object[] sinks = new Object[64 * THREADS];
    static volatile boolean start;
    static volatile boolean stop;

    private static void waitOnStartSignal() {
        try {
            startSignal.await();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String... args) throws Throwable {
        for (int t = 0; t < THREADS; t++) {
            int ft = t;
            new Thread(() -> work(ft * 64)).start();
        }

        Thread.sleep(1000);
        startSignal.countDown();
        Thread.sleep(30_000);
        stop = true;
    }

    public static void work(int idx) {
        waitOnStartSignal();
        while (!stop) {
            semaphore.acquireUninterruptibly();
            try {
                sinks[idx] = new byte[128];
            } catch (Throwable ex) {
                throw new RuntimeException(ex);
            } finally {
                semaphore.release();
            }
        }
    }
}


I ran this on the command line with
`.build/linux-x86_64-server-release/jdk/bin/java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr`

Running this without the fix gives at-safepoint times

78352042
71620203
69088015
63752840
61954634
57154656
55503155
50447201
49957634
42141662


Running with the backported fix gives at-safepoint times

14292926
5145001
3579042
3204251
1854438
1734104
1664821
1647101
1621830
1542978

-------------

PR Comment: https://git.openjdk.org/jdk17u-dev/pull/3614#issuecomment-3028816808


More information about the jdk-updates-dev mailing list