RFR (S): JDK-8129549 G1: Make sure the concurrent thread does not mix its logging with the STW pauses

Tue Jun 23 12:47:34 UTC 2015

Hi Thomas and Per,

I noticed that there is still a problem with the root-region-end logging 
in my webrev.01 version.

We notified that the root region had finished before we did the "end" 
logging. Thus that last log line could still end up interleaved with the 
evacuation pause logging.

I discussed a few different solutions with Per and StefanK and finally 
we settled on simply moving the logging in inside of 
ConcurrentMark::scanRootRegions().

Here's an updated webrev:
http://cr.openjdk.java.net/~brutisso/8129549/webrev.02/
http://cr.openjdk.java.net/~brutisso/8129549/webrev.01-02.diff/

Thanks,
Bengt

On 2015-06-23 12:34, Bengt Rutisson wrote:
>
> Hi Thomas,
>
> Thanks for looking at this!
>
> On 2015-06-23 11:42, Thomas Schatzl wrote:
>> Hi Bengt,
>>
>> On Tue, 2015-06-23 at 10:09 +0200, Bengt Rutisson wrote:
>>> Hi everyone,
>>>
>>> Could I have a couple of reviews for this patch?
>>>
>>> http://cr.openjdk.java.net/~brutisso/8129549/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8129549
>>>
>>>   From the bug report:
>>>
>>> Currently the ConcurrentMarkThread in G1 does not take part in the
>>> safepoint synchronization for its logging. This means that logging from
>>> the concurrent thread can be interleaved with loggging from a normal GC
>>> pause:
>>>
>>> 1,491: #49: [GC concurrent-root-region-scan-start]
>>> 1,500: #50: [GC pause (G1 Evacuation Pause) (young)1,501: #49: [GC
>>> concurrent-root-region-scan-end, 0,0100111 secs]
>>> 1,501: #49: [GC concurrent-mark-start]
>>> , 0,0162226 secs]
>>>      [Root Region Scan Waiting: 1,1 ms]
>>>      [Parallel Time: 6,1 ms, GC Workers: 23]
>>>
>>>
>>> In the log section above it looks like the root region scanning is
>>> completed inside of the G1 evacuation pause and the concurrent mark is
>>> started while the pause is running.
>>>
>>> This is not really what is happening. In fact the evacuation pause 
>>> waits
>>> for the root region scanning to start before it proceeds with the 
>>> actual
>>> GC work. And the mark tasks that actually do the concurrent marking 
>>> work
>>> will be blocked by the safepoint so, the concurrent marking is not
>>> started until after the evacuation pause.
>>>
>>> So, apart from being a bit ugly this interleaved logging is actually a
>>> bit misleading.
>>>
>>> The root region scanning can be fixed by moving the code that waits for
>>> the root region scanning to complete to the very start of the 
>>> evacuation
>>> pause. Before any logging from the pause is done.
>>>
>>> The other concurrent logging can be fixed by joining the STS before
>>> doing the actual logging.
>>>
>>> Doing these fixes makes the output look like this:
>>>
>>> 1,364: #40: [GC concurrent-root-region-scan-start]
>>> 1,374: #40: [GC concurrent-root-region-scan-end, 0,0106570 secs]
>>> 1,375: #41: [GC pause (G1 Evacuation Pause) (young), 0,0169416 secs]
>>>      [Root Region Scan Waiting: 2,6 ms]
>>>      [Parallel Time: 5,5 ms, GC Workers: 23]
>>>    ...
>>>    [Times: user=0,10 sys=0,00 real=0,02 secs]
>>> 1,392: #40: [GC concurrent-mark-start]
>> Seems okay except for some minor tidbit that you may ignore: in
>> ConcurrentMark.cpp:3019: I would prefer if the message contained
>> "ConcurrentMark::" as there might be multiple methods called abort. Of
>> course, hopefully the stack trace shows the method. Really not sure.
>
> Good point. I'll add "ConcurrentMark::" to the assert message before I 
> push.
>
> Thanks for the review!
> Bengt
>
>> Thanks,
>>    Thomas
>>
>>
>