RFR: JDK-8148992: VM can hang on exit if root region scanning is initiated but not executed

Jesper Wilhelmsson jesper.wilhelmsson at oracle.com
Wed Feb 10 11:33:05 UTC 2016


Thanks for cleaning it up! This made the code a lot easier to follow.
Looks good!
/Jesper


Den 10/2/16 kl. 10:02, skrev Bengt Rutisson:
>
> Hi Jesper (and everyone),
>
> On 2016-02-08 15:50, Jesper Wilhelmsson wrote:
>> Looks good!
>
> Thanks for looking at this!
>
> I found a bug in the last webrev. The G1ConcurrentMark::scanRootRegions()  has
> the side effect that it calls ClassLoaderDataGraph::clear_claimed_marks(). It is
> not ok to skip this call just because there are no root regions to scan. So, I
> can't guard the call to scanRootRegions() with "if
> (_cm->root_regions()->scan_in_progress())" as I did in webrev.02.
>
> I find this side effect a bit odd. So, I moved
> ClassLoaderDataGraph::clear_claimed_marks() out to its own phase. This will also
> make it show up clearer in our logs if it starts taking a long time.
>
> Instead of checking scan_in_progress() both before the call to scanRootRegions()
> and inside of it I moved the assert(!has_aborted()) in to scanRootRegions().
>
> Here's an updated webrev:
> http://cr.openjdk.java.net/~brutisso/8148992/webrev.03/
>
> And here's a diff compared to the last version:
> http://cr.openjdk.java.net/~brutisso/8148992/webrev.02-03.diff/
>
> Thanks,
> Bengt
>
>>
>> /Jesper
>>
>> Den 8/2/16 kl. 15:14, skrev Bengt Rutisson:
>>>
>>> Hi again,
>>>
>>> Based on some internal feedback from Thomas I have an updated webrev:
>>> http://cr.openjdk.java.net/~brutisso/8148992/webrev.02/
>>>
>>> Changes are:
>>>
>>> - Changed the assert I added to only apply if we would actually do any root
>>> region scanning
>>> - Move the notification on the root region lock into a private helper method
>>> that could be used in both places where this code was duplicated.
>>> - Removed an obsolete comment in g1CollectedHeap.cpp
>>>
>>> Here's a diff compared to the first version:
>>> http://cr.openjdk.java.net/~brutisso/8148992/webrev.00-02.diff/
>>>
>>> (The comment change is only in the full webrev I didn't get it in to the diff.)
>>>
>>> Thanks,
>>> Bengt
>>>
>>>
>>> On 2016-02-08 11:15, Bengt Rutisson wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Could I have a couple of reviews for this change?
>>>>
>>>> http://cr.openjdk.java.net/~brutisso/8148992/webrev.00
>>>> https://bugs.openjdk.java.net/browse/JDK-8148992
>>>>
>>>> There are some more details in the bug report, but here's the most relevant
>>>> text:
>>>>
>>>> The reason for the hang is that during shutdown we don't check the root region
>>>> scanning.
>>>>
>>>> The ConcurrentMark loop starts like this:
>>>>
>>>>   while (!_should_terminate) {
>>>>     // wait until started is set.
>>>>     sleepBeforeNextCycle();
>>>>     if (_should_terminate) {
>>>>       break;
>>>>     }
>>>>
>>>> If _should_terminate is true we just exit without notifying any waiters on the
>>>> root region lock. If a GC happens during shutdown the GC will hang waiting for
>>>> the root region scanning to finish but the ConcurrentMark thread has just
>>>> exited and will not do any root region scanning.
>>>>
>>>> I can trigger this behavior by adding a sleep in the above code:
>>>>
>>>>   while (!_should_terminate) {
>>>>     // wait until started is set.
>>>>     sleepBeforeNextCycle();
>>>>     if (_should_terminate) {
>>>>       for (int i = 0; i < 10; i++) {
>>>>         os::naked_short_sleep(999);
>>>>       }
>>>>       break;
>>>>     }
>>>>
>>>> and running this small java program:
>>>>
>>>> import java.util.LinkedList;
>>>>
>>>> public class Repro2 {
>>>>
>>>>     public static LinkedList<byte[]> dummyStore = new LinkedList<>();
>>>>
>>>>     public static void main(String[] args) throws Exception {
>>>>         System.out.println("Started");
>>>>         for (int i = 0; i < 1024*16; i++) {
>>>>             dummyStore.add(new byte[1024]);
>>>>         }
>>>>         System.out.println("Triggered one YC");
>>>>
>>>>         Thread thread = new Thread(()->System.exit(0));
>>>>         thread.start();
>>>>         Thread.sleep(100);
>>>>
>>>>         for (int i = 0; i < 1024*16; i++) {
>>>>             dummyStore.add(new byte[1024]);
>>>>         }
>>>>         System.out.println("Triggered Initial mark");
>>>>
>>>>         System.gc(); // Full GG
>>>>
>>>>         System.out.println("Done.");
>>>>     }
>>>> }
>>>>
>>>>
>>>> Running with the sleep added and the following command line:
>>>>
>>>> java -Xmx16m -Xmx64m -XX:InitiatingHeapOccupancyPercent=0 Repro2
>>>>
>>>> makes the VM hang every time on my workstation.
>>>>
>>>> If I add a "cancel_scan()" method and call it before the ConcurrentMark thread
>>>> is giving up, the VM does not hang anymore. That is, running with this code
>>>> makes the VM sleep a while during shutdown but it does not hang:
>>>>
>>>>
>>>>   while (!_should_terminate) {
>>>>     // wait until started is set.
>>>>     sleepBeforeNextCycle();
>>>>     if (_should_terminate) {
>>>>       for (int i = 0; i < 10; i++) {
>>>>         os::naked_short_sleep(999);
>>>>       }
>>>>       _cm->root_regions()->cancel_scan();
>>>>       break;
>>>>     }
>>>
>



More information about the hotspot-gc-dev mailing list