RFR: JDK-8148992: VM can hang on exit if root region scanning is initiated but not executed

Bengt Rutisson bengt.rutisson at oracle.com
Mon Feb 8 10:15:57 UTC 2016


Hi all,

Could I have a couple of reviews for this change?

http://cr.openjdk.java.net/~brutisso/8148992/webrev.00
https://bugs.openjdk.java.net/browse/JDK-8148992

There are some more details in the bug report, but here's the most 
relevant text:

The reason for the hang is that during shutdown we don't check the root 
region scanning.

The ConcurrentMark loop starts like this:

   while (!_should_terminate) {
     // wait until started is set.
     sleepBeforeNextCycle();
     if (_should_terminate) {
       break;
     }

If _should_terminate is true we just exit without notifying any waiters 
on the root region lock. If a GC happens during shutdown the GC will 
hang waiting for the root region scanning to finish but the 
ConcurrentMark thread has just exited and will not do any root region 
scanning.

I can trigger this behavior by adding a sleep in the above code:

   while (!_should_terminate) {
     // wait until started is set.
     sleepBeforeNextCycle();
     if (_should_terminate) {
       for (int i = 0; i < 10; i++) {
         os::naked_short_sleep(999);
       }
       break;
     }

and running this small java program:

import java.util.LinkedList;

public class Repro2 {

     public static LinkedList<byte[]> dummyStore = new LinkedList<>();

     public static void main(String[] args) throws Exception {
         System.out.println("Started");
         for (int i = 0; i < 1024*16; i++) {
             dummyStore.add(new byte[1024]);
         }
         System.out.println("Triggered one YC");

         Thread thread = new Thread(()->System.exit(0));
         thread.start();
         Thread.sleep(100);

         for (int i = 0; i < 1024*16; i++) {
             dummyStore.add(new byte[1024]);
         }
         System.out.println("Triggered Initial mark");

         System.gc(); // Full GG

         System.out.println("Done.");
     }
}


Running with the sleep added and the following command line:

java -Xmx16m -Xmx64m -XX:InitiatingHeapOccupancyPercent=0 Repro2

makes the VM hang every time on my workstation.

If I add a "cancel_scan()" method and call it before the ConcurrentMark 
thread is giving up, the VM does not hang anymore. That is, running with 
this code makes the VM sleep a while during shutdown but it does not hang:


   while (!_should_terminate) {
     // wait until started is set.
     sleepBeforeNextCycle();
     if (_should_terminate) {
       for (int i = 0; i < 10; i++) {
         os::naked_short_sleep(999);
       }
       _cm->root_regions()->cancel_scan();
       break;
     }



More information about the hotspot-gc-dev mailing list