RFR: 8264760: JVM crashes when two threads encounter the same resolution error [v3]

David Holmes david.holmes at oracle.com
Mon Apr 19 06:47:37 UTC 2021


On 19/04/2021 2:38 pm, Wang Huang wrote:
> On Sat, 17 Apr 2021 07:40:05 GMT, Wang Huang <whuang at openjdk.org> wrote:
> 
>>> test/hotspot/jtreg/runtime/Nestmates/membership/TestNestHostErrorWithMultiThread.java line 58:
>>>
>>>> 56:   public void test() throws Throwable {
>>>> 57:
>>>> 58:     CountDownLatch latch = new CountDownLatch(1);
>>>
>>> The use of CDL doesn't guarantee that both threads have reached the await() before the main thread does the countDown(). If you use a CyclicBarrier instead it will trip when the second thread arrives.
>>
>> @dholmes-ora
> 
>> The use of CDL doesn't guarantee that both threads have reached the await() before the main thread does the countDown(). If you use a CyclicBarrier instead it will trip when the second thread arrives.
> 
> I have tested a small case  the `CyclicBarrier ` version and `CountDownLatch ` version on `JDK17` & `JDK11`. It seems that :
> * For `JDK17`, the wake-up time between two threads in the `CyclicBarrier ` version is bigger than the  `CountDownLatch ` version. In other words,   the  `CountDownLatch ` version can trigger this bug more easily .

So ... thinking about what can happen here

Case 1: countdown latch

The thread that does the countdown() could do that before either thread 
calls await(), between the two calls to await(), or after both calls to 
await(). So how close in time the two threads are to getting the 
nesthost will depend on scheduling and whether either thread actually 
blocked in await(). So I can see how we can just be lucky here and both 
threads sail through the latch at around the same time without needing 
to block.

Case 2: cyclic barrier

The first thread to arrive has to block - no choice. The last thread to 
arrive never blocks. So we are actually guaranteed to see worse 
behaviour here as the last thread sails through the barrier and does the 
nesthost lookup well ahead of the thread that had to block.

So while logically the cyclic barrier releases all threads "at the time 
time", in terms of implementation we've added a lot of overhead to the 
blocked threads.

Conversely the CDL is not guaranteed to show better bug reproducability 
(extreme case  - run on a uniprocessor) but in practice as long as we 
have a few CPUs to play with it likely will behave "better".

Okay - the CDL version is what we should go with.

Sorry to have taken up so much time on this. I didn't expect you to 
benchmark things. :)

Thanks,
David

> * the `CyclicBarrier ` version and   the  `CountDownLatch ` version running under `JDK11`  have the same performance.
> 
> // CountDownLatch version
> public class TestCountDownLatch {
> 
>    public static void main(String args[]) {
>      CountDownLatch latch1 = new CountDownLatch(1);
>      CountDownLatch latch2 = new CountDownLatch(2);
> 
>      MyThread t1 = new MyThread(latch1, latch2);
>      MyThread t2 = new MyThread(latch1, latch2);
> 
>      t1.start();
>      t2.start();
> 
>      try {
>        // waiting thread creation
>        latch2.await();
>        latch1.countDown();
> 
>        t1.join();
>        t2.join();
>      } catch (InterruptedException e) {}
> 
>      System.out.println(Math.abs(t1.getAwakeTime() - t2.getAwakeTime()));
>    }
> 
>    static class MyThread extends Thread {
>      private CountDownLatch latch1;
>      private CountDownLatch latch2;
> 		private long awaketime;
> 
>      MyThread(CountDownLatch latch1, CountDownLatch latch2) {
>        this.latch1 = latch1;
>        this.latch2 = latch2;
>      }
> 
>      @Override
>      public void run() {
>        try {
>          latch2.countDown();
>          // Try to have all threads trigger the nesthost check at the same time
>          latch1.await();
>          awaketime = System.nanoTime();
>        } catch (InterruptedException e) {}
>      }
> 
>      public long getAwakeTime() {
>        return awaketime;
>      }
>    }
> }
> 
> 
> 
> // CyclicBarrier version
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.CyclicBarrier;
> 
> public class TestCyclicBarrier {
> 
>    public static void main(String args[]) {
>      CyclicBarrier barrier = new CyclicBarrier(2);
> 
>      MyThread t1 = new MyThread(barrier);
>      MyThread t2 = new MyThread(barrier);
> 
>      t1.start();
>      t2.start();
> 
>      try {
>        t1.join();
>        t2.join();
>      } catch (InterruptedException e) {}
> 
>      System.out.println(Math.abs(t1.getAwakeTime() - t2.getAwakeTime()));
>    }
> 
>    static class MyThread extends Thread {
>      private CyclicBarrier barrier;
> 		private long awaketime;
> 
>      MyThread(CyclicBarrier barrier) {
>        this.barrier = barrier;
>      }
> 
>      @Override
>      public void run() {
>        try {
>          // Try to have all threads trigger the nesthost check at the same time
>          barrier.await();
>          awaketime = System.nanoTime();
>        } catch (Exception e) {}
>      }
> 
>      public long getAwakeTime() {
>        return awaketime;
>      }
>    }
> }
> 
> * results:
>   - `CountDownLatch` against `CyclicBarrier`
>   ```
>    CountDownLatch: 3960
>    CyclicBarrier: 3361280
>    CountDownLatch: 6120
>    CyclicBarrier: 3337540
>    CountDownLatch: 6160
>    CyclicBarrier: 3462920
>    CountDownLatch: 9150
>    CyclicBarrier: 3328090
>    CountDownLatch: 6580
>    CyclicBarrier: 3345450
>    CountDownLatch: 6340
>    CyclicBarrier: 3342900
>    CountDownLatch: 1330
>    CyclicBarrier: 3379210
>    CountDownLatch: 7780
>    CyclicBarrier: 3219020
>    CountDownLatch: 2460
>    CyclicBarrier: 3297020
>    CountDownLatch: 7320
>    CyclicBarrier: 3332770
> 
> -  `JDK17` against `CyclicBarrier`
> 
> CyclicBarrier jdk17: 3188590
> CyclicBarrier jdk11: 49090
> CyclicBarrier jdk17: 3123340
> CyclicBarrier jdk11: 14680
> CyclicBarrier jdk17: 3107910
> CyclicBarrier jdk11: 780
> CyclicBarrier jdk17: 3072600
> CyclicBarrier jdk11: 1720
> CyclicBarrier jdk17: 3164340
> CyclicBarrier jdk11: 41020
> CyclicBarrier jdk17: 3098490
> CyclicBarrier jdk11: 7060
> CyclicBarrier jdk17: 3058220
> CyclicBarrier jdk11: 14750
> CyclicBarrier jdk17: 3052460
> CyclicBarrier jdk11: 660
> CyclicBarrier jdk17: 3083650
> CyclicBarrier jdk11: 14670
> CyclicBarrier jdk17: 3116260
> CyclicBarrier jdk11: 850
> 
> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/3392
> 


More information about the hotspot-runtime-dev mailing list