[jdk21u-dev] RFR: 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention
Daniel Huang
duke at openjdk.org
Wed Jul 2 17:40:44 UTC 2025
On Mon, 30 Jun 2025 21:03:48 GMT, Daniel Huang <duke at openjdk.org> wrote:
> Backport for ShenandoahLock performance regression issue. The fix involves sleeping for a very short duration every 3 yields, with the number of yields picked through manual testing.
>
> Clean backport, ran GHA sanity checks and locally tested `tier1`, `tier2`, and `hotspot_gc_shenandoah`. `test/jdk/java/nio/channels/FileChannel/directio/DirectIOTest.java` sometimes fails locally, but it also sometimes failed before the backport.
> `test/jdk/java/nio/channels/DatagramChannel/SendReceiveMaxSize.java` fails locally, but it also fails locally before the backport.
Sure!
The original fix had code to reproduce the bug:
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Semaphore;
public class Alloc {
static final CountDownLatch startSignal = new CountDownLatch(1);
static final Semaphore semaphore = new Semaphore(128);
static final int THREADS = 1024; //64 threads per CPU core, 16 cores
static final Object[] sinks = new Object[64 * THREADS];
static volatile boolean start;
static volatile boolean stop;
private static void waitOnStartSignal() {
try {
startSignal.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
public static void main(String... args) throws Throwable {
for (int t = 0; t < THREADS; t++) {
int ft = t;
new Thread(() -> work(ft * 64)).start();
}
Thread.sleep(1000);
startSignal.countDown();
Thread.sleep(30_000);
stop = true;
}
public static void work(int idx) {
waitOnStartSignal();
while (!stop) {
semaphore.acquireUninterruptibly();
try {
sinks[idx] = new byte[128];
} catch (Throwable ex) {
throw new RuntimeException(ex);
} finally {
semaphore.release();
}
}
}
}
I ran this on the command line with
`.build/linux-x86_64-server-release/jdk/bin/java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr`
Running this without the fix gives at-safepoint times
22273444
11615507
11297887
10424031
10117190
9789552
9754920
9599965
7477300
6897913
Running with the backported fix gives at-safepoint times
15667088
8279113
3800276
853206
464314
399752
387322
381562
378641
358231
-------------
PR Comment: https://git.openjdk.org/jdk21u-dev/pull/1933#issuecomment-3028728951
More information about the jdk-updates-dev
mailing list