[jdk21u-dev] RFR: 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention

Goetz Lindenmaier goetz at openjdk.org
Fri Jul 4 10:25:43 UTC 2025


On Wed, 2 Jul 2025 17:37:42 GMT, Daniel Huang <duke at openjdk.org> wrote:

>> Backport for ShenandoahLock performance regression issue. The fix involves sleeping for a very short duration every 3 yields, with the number of yields picked through manual testing.
>> 
>> Clean backport, ran GHA sanity checks and locally tested `tier1`, `tier2`, and `hotspot_gc_shenandoah`. `test/jdk/java/nio/channels/FileChannel/directio/DirectIOTest.java` sometimes fails locally, but it also sometimes failed before the backport.
>> `test/jdk/java/nio/channels/DatagramChannel/SendReceiveMaxSize.java` fails locally, but it also fails locally before the backport.
>
> Sure!
> 
> The original fix had code to reproduce the bug:
> 
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.Semaphore;
> 
> public class Alloc {
>     static final CountDownLatch startSignal = new CountDownLatch(1);
>     static final Semaphore semaphore = new Semaphore(128);
>     static final int THREADS = 1024; //64 threads per CPU core, 16 cores
>     static final Object[] sinks = new Object[64 * THREADS];
>     static volatile boolean start;
>     static volatile boolean stop;
> 
>     private static void waitOnStartSignal() {
>         try {
>             startSignal.await();
>         } catch (InterruptedException e) {
>             throw new RuntimeException(e);
>         }
>     }
> 
>     public static void main(String... args) throws Throwable {
>         for (int t = 0; t < THREADS; t++) {
>             int ft = t;
>             new Thread(() -> work(ft * 64)).start();
>         }
> 
>         Thread.sleep(1000);
>         startSignal.countDown();
>         Thread.sleep(30_000);
>         stop = true;
>     }
> 
>     public static void work(int idx) {
>         waitOnStartSignal();
>         while (!stop) {
>             semaphore.acquireUninterruptibly();
>             try {
>                 sinks[idx] = new byte[128];
>             } catch (Throwable ex) {
>                 throw new RuntimeException(ex);
>             } finally {
>                 semaphore.release();
>             }
>         }
>     }
> }
> 
> 
> I ran this on the command line with
> `.build/linux-x86_64-server-release/jdk/bin/java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr`
> 
> Running this without the fix gives at-safepoint times
> 
> 22273444
> 11615507
> 11297887
> 10424031
> 10117190
> 9789552
> 9754920
> 9599965
> 7477300
> 6897913
> 
> 
> Running with the backported fix gives at-safepoint times
> 
> 15667088
> 8279113
> 3800276
> 853206
> 464314
> 399752
> 387322
> 381562
> 378641
> 358231

Hi @dtmhuang 
well, that shows that the change does what it intends to do.  That's not my point.
I want to make sure you cause no regressions of any other kind.  jdk25, which you mention wrt. testing is not live yet.
I don't see why we should take any risk for a performance optimization that helps only in rare situations.

-------------

PR Comment: https://git.openjdk.org/jdk21u-dev/pull/1933#issuecomment-3035405536


More information about the jdk-updates-dev mailing list