RFR: JDK-8148992: VM can hang on exit if root region scanning is initiated but not executed

Bengt Rutisson bengt.rutisson at oracle.com
Wed Feb 10 11:45:12 UTC 2016


Hi Jesper,

On 2016-02-10 12:33, Jesper Wilhelmsson wrote:
> Thanks for cleaning it up! This made the code a lot easier to follow.
> Looks good!
Great! Thanks for looking at this!

Bengt

>
> /Jesper
>
>
> Den 10/2/16 kl. 10:02, skrev Bengt Rutisson:
>>
>> Hi Jesper (and everyone),
>>
>> On 2016-02-08 15:50, Jesper Wilhelmsson wrote:
>>> Looks good!
>>
>> Thanks for looking at this!
>>
>> I found a bug in the last webrev. The 
>> G1ConcurrentMark::scanRootRegions()  has
>> the side effect that it calls 
>> ClassLoaderDataGraph::clear_claimed_marks(). It is
>> not ok to skip this call just because there are no root regions to 
>> scan. So, I
>> can't guard the call to scanRootRegions() with "if
>> (_cm->root_regions()->scan_in_progress())" as I did in webrev.02.
>>
>> I find this side effect a bit odd. So, I moved
>> ClassLoaderDataGraph::clear_claimed_marks() out to its own phase. 
>> This will also
>> make it show up clearer in our logs if it starts taking a long time.
>>
>> Instead of checking scan_in_progress() both before the call to 
>> scanRootRegions()
>> and inside of it I moved the assert(!has_aborted()) in to 
>> scanRootRegions().
>>
>> Here's an updated webrev:
>> http://cr.openjdk.java.net/~brutisso/8148992/webrev.03/
>>
>> And here's a diff compared to the last version:
>> http://cr.openjdk.java.net/~brutisso/8148992/webrev.02-03.diff/
>>
>> Thanks,
>> Bengt
>>
>>>
>>> /Jesper
>>>
>>> Den 8/2/16 kl. 15:14, skrev Bengt Rutisson:
>>>>
>>>> Hi again,
>>>>
>>>> Based on some internal feedback from Thomas I have an updated webrev:
>>>> http://cr.openjdk.java.net/~brutisso/8148992/webrev.02/
>>>>
>>>> Changes are:
>>>>
>>>> - Changed the assert I added to only apply if we would actually do 
>>>> any root
>>>> region scanning
>>>> - Move the notification on the root region lock into a private 
>>>> helper method
>>>> that could be used in both places where this code was duplicated.
>>>> - Removed an obsolete comment in g1CollectedHeap.cpp
>>>>
>>>> Here's a diff compared to the first version:
>>>> http://cr.openjdk.java.net/~brutisso/8148992/webrev.00-02.diff/
>>>>
>>>> (The comment change is only in the full webrev I didn't get it in 
>>>> to the diff.)
>>>>
>>>> Thanks,
>>>> Bengt
>>>>
>>>>
>>>> On 2016-02-08 11:15, Bengt Rutisson wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Could I have a couple of reviews for this change?
>>>>>
>>>>> http://cr.openjdk.java.net/~brutisso/8148992/webrev.00
>>>>> https://bugs.openjdk.java.net/browse/JDK-8148992
>>>>>
>>>>> There are some more details in the bug report, but here's the most 
>>>>> relevant
>>>>> text:
>>>>>
>>>>> The reason for the hang is that during shutdown we don't check the 
>>>>> root region
>>>>> scanning.
>>>>>
>>>>> The ConcurrentMark loop starts like this:
>>>>>
>>>>>   while (!_should_terminate) {
>>>>>     // wait until started is set.
>>>>>     sleepBeforeNextCycle();
>>>>>     if (_should_terminate) {
>>>>>       break;
>>>>>     }
>>>>>
>>>>> If _should_terminate is true we just exit without notifying any 
>>>>> waiters on the
>>>>> root region lock. If a GC happens during shutdown the GC will hang 
>>>>> waiting for
>>>>> the root region scanning to finish but the ConcurrentMark thread 
>>>>> has just
>>>>> exited and will not do any root region scanning.
>>>>>
>>>>> I can trigger this behavior by adding a sleep in the above code:
>>>>>
>>>>>   while (!_should_terminate) {
>>>>>     // wait until started is set.
>>>>>     sleepBeforeNextCycle();
>>>>>     if (_should_terminate) {
>>>>>       for (int i = 0; i < 10; i++) {
>>>>>         os::naked_short_sleep(999);
>>>>>       }
>>>>>       break;
>>>>>     }
>>>>>
>>>>> and running this small java program:
>>>>>
>>>>> import java.util.LinkedList;
>>>>>
>>>>> public class Repro2 {
>>>>>
>>>>>     public static LinkedList<byte[]> dummyStore = new LinkedList<>();
>>>>>
>>>>>     public static void main(String[] args) throws Exception {
>>>>>         System.out.println("Started");
>>>>>         for (int i = 0; i < 1024*16; i++) {
>>>>>             dummyStore.add(new byte[1024]);
>>>>>         }
>>>>>         System.out.println("Triggered one YC");
>>>>>
>>>>>         Thread thread = new Thread(()->System.exit(0));
>>>>>         thread.start();
>>>>>         Thread.sleep(100);
>>>>>
>>>>>         for (int i = 0; i < 1024*16; i++) {
>>>>>             dummyStore.add(new byte[1024]);
>>>>>         }
>>>>>         System.out.println("Triggered Initial mark");
>>>>>
>>>>>         System.gc(); // Full GG
>>>>>
>>>>>         System.out.println("Done.");
>>>>>     }
>>>>> }
>>>>>
>>>>>
>>>>> Running with the sleep added and the following command line:
>>>>>
>>>>> java -Xmx16m -Xmx64m -XX:InitiatingHeapOccupancyPercent=0 Repro2
>>>>>
>>>>> makes the VM hang every time on my workstation.
>>>>>
>>>>> If I add a "cancel_scan()" method and call it before the 
>>>>> ConcurrentMark thread
>>>>> is giving up, the VM does not hang anymore. That is, running with 
>>>>> this code
>>>>> makes the VM sleep a while during shutdown but it does not hang:
>>>>>
>>>>>
>>>>>   while (!_should_terminate) {
>>>>>     // wait until started is set.
>>>>>     sleepBeforeNextCycle();
>>>>>     if (_should_terminate) {
>>>>>       for (int i = 0; i < 10; i++) {
>>>>>         os::naked_short_sleep(999);
>>>>>       }
>>>>>       _cm->root_regions()->cancel_scan();
>>>>>       break;
>>>>>     }
>>>>
>>




More information about the hotspot-gc-dev mailing list