RFR(S): 7096030: G1: PrintGCDetails enhancements

Tue Oct 18 08:00:10 UTC 2011

John,

Thanks for cleaning this up. Your changes is definitely a step in the 
right direction.

Some comments/questions:

The CR is called "7096030 G1: PrintGCDetails enhancements", but the 
destructor improvement is actually unrelated to how the GC details are 
printed. I think it is an important change and worth keeping track of. 
How do we normally do that? Is it enough to mention it in the summary of 
of your commit? Would it be ok to file a new CR and close both with your 
push? Or should we even push the changes separately?

g1RemSet.hpp

Just for my own education: I see in your CR updates that the cause of 
the long other times were contention on the runtime locks. But when I 
look at ResourceMark I don't see it using any locks. At least not in 
product builds. Which locks got contended?

(Just for the record - I know this is nothing you introduced - but I 
think the acronym DCTOC is terrible. I have no chance remembering what 
it means. I know now that it is the short form for 
DirtyCardToOopClosure, but it takes some mental overhead to figure that 
out each time.)

g1CollectorPolicy.cpp

record_collection_pause_end()

The parallel_known_time and parallel_other_time are only used inside if 
(parallel), so I would suggest moving the calculations to set them up 
inside the if as well. Also, parallel_known_time is very similar to 
other_time_ms for the non-parallel case. Since parallel_known_time is 
just a partial result to be used to calculate parallel_other_time it 
could maybe be refactored to be exactly the same as the non-parallel 
other_time_ms. This would save some code duplication.

The _par_last_gc_worker_other_times_ms[] array is a bit different 
compared to the other _par_last*[] arrays in that it does not keep track 
of its data. The data is calculated on the spot where we need it in 
record_collection_pause_end(). So, I would suggest to make 
_par_last_gc_worker_other_times_ms[] a local variable in that method.

Thanks for being consistent with the pause ordering everywhere. Makes it 
very easy to read the code when the pause parts are always handled in 
the same order.

Since we have been discussing the scope of code cleanup changes, I would 
just like to mention that I like the changes you did to 
G1CollectorPolicy::print_summary() 
G1CollectorPolicy::print_summary_sd(). To me this is the type of change 
that feels natural. You are changing a lot of code in this file and 
these methods are not following the same white space standard as the 
rest of the file. However, why not go all the way and make the file 
consistent by doing the same change to G1CollectorPolicy::avg_value(), 
G1CollectorPolicy::max_value(), G1CollectorPolicy::sum_of_values() and 
G1CollectorPolicy::max_sum()  ?

Bengt

On 2011-10-18 00:20, John Cuthbertson wrote:
> Hi Everyone,
>
> A new webrev for this CR can be found at: 
> http://cr.openjdk.java.net/~johnc/7096030/webrev.2/ based upon some 
> additional feedback.
>
> Changes in this webrev include:
> * Some minor reorganization in 
> G1CollectorPolicy::record_collection_end().
> * Removal of the _satb_drain_time_set flag.
> * In addition to the SATB Drain Time and the # of processed SATB 
> buffers, output of Mark Stack Scanning is now dependent on whether 
> marking is active.
> * Modified the TraceGen0Time output to be ordered consistently with 
> PrintGCDetails.
>
> Testing: specjvm98 and OpenDS; manually checked that the TraceGen0Time 
> and TraceGen1Time output values are consistent with summarizing the 
> pause data.
>
> Thanks,
>
> JohnC
>
> On 10/14/11 10:17, John Cuthbertson wrote:
>> Hi Everyone,
>>
>> I have a new webrev for this change based upon some initial feedback, 
>> which can be found here: 
>> http://cr.openjdk.java.net/~johnc/7096030/webrev.1/
>>
>> Changes in this webrev include:
>> * Coalescing the GC Worker Other time and Parallel Other Time into a 
>> single "GC Worker Other" time.
>> * Modifying some of the tags in the PrintGCDetails output
>> * The order of the phases in the PrintGCDetails when 
>> ParallelGCThreads==0 now matches that of the parallel case.
>>
>> New output:
>>
>> 151.550: [GC pause (young), 0.57693560 secs]
>>   [SATB Drain Time:   0.0 ms]
>>   [Parallel Time: 553.7 ms]
>>      [GC Worker Start (ms):  151550.7  151550.7  151550.7  151551.3
>>       Avg: 151550.9, Min: 151550.7, Max: 151551.3, Diff:   0.6]
>>      [Ext Root Scanning (ms):  2.8  2.8  3.0  2.6
>>       Avg:   2.8, Min:   2.6, Max:   3.0, Diff:   0.4]
>>      [Mark Stack Scanning (ms):  0.0  0.0  0.0  0.0
>>       Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>>      [Update RS (ms):  5.1  5.6  3.9  5.3
>>       Avg:   5.0, Min:   3.9, Max:   5.6, Diff:   1.7]
>>         [Processed Buffers : 3 3 6 5
>>          Sum: 17, Avg: 4, Min: 3, Max: 6, Diff: 3]
>>      [Scan RS (ms):  14.0  13.7  14.8  13.3
>>       Avg:  13.9, Min:  13.3, Max:  14.8, Diff:   1.5]
>>      [Object Copy (ms):  531.4  531.2  531.5  531.5
>>       Avg: 531.4, Min: 531.2, Max: 531.5, Diff:   0.3]
>>      [Termination (ms):  0.0  0.0  0.0  0.0
>>       Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>>         [Termination Attempts : 1 1 1 1
>>          Sum: 4, Avg: 1, Min: 1, Max: 1, Diff: 0]
>>      [GC Worker End (ms):  152104.1  152104.1  152104.0  152104.0
>>       Avg: 152104.1, Min: 152104.0, Max: 152104.1, Diff:   0.1]
>>      [GC Worker (ms):  553.4  553.4  553.3  552.7
>>       Avg: 553.2, Min: 552.7, Max: 553.4, Diff:   0.7]
>>      [GC Worker Other (ms):  0.4  0.4  0.5  1.0
>>       Avg:   0.6, Min:   0.4, Max:   1.0, Diff:   0.6]
>>   [Clear CT:   1.4 ms]
>>   [Other:  21.8 ms]
>>      [Choose CSet:   0.0 ms]
>>      [Ref Proc:   2.8 ms]
>>      [Ref Enq:   0.0 ms]
>>   [Eden: 1230M(1230M)->0B(1230M) Survivors: 134M->134M Heap: 
>> 3312M(4096M)->2216M(4096M)]
>> [Times: user=0.77 sys=0.61, real=0.58 secs]
>>
>> Thanks,
>>
>> JohnC
>>
>> On 10/12/11 09:55, John Cuthbertson wrote:
>>> Hi Everyone,
>>>
>>> Can I have a a couple of volunteers review these changes? The webrev 
>>> can be found at: http://cr.openjdk.java.net/~johnc/7096030/webrev.0/
>>>
>>> Description:
>>> While looking at the PrintGCDetails output from running some large 
>>> applications, it was noticed that there seemed to be a large amount 
>>> of time listed as "Parallel Other Time" which couldn't be accounted 
>>> for. To account for this I have replaced the single overall 
>>> "Parallel Other Time" with two times that is reported for each GC 
>>> worker thread:
>>>
>>> GC Worker Other Times
>>> This is the unaccounted time for each GC worker between its start 
>>> and end times. It is calculated as "GC Worker Time" minus the sum of 
>>> the times for that worker given for "Ext. Root Scanning", "Mark 
>>> Stack Scanning", "RS Update", "RS Scanning", "Object Copying", and 
>>> "Termination".
>>>
>>> Parallel Other Times
>>> This is the difference between the overall elapsed time for the 
>>> parallel phase and the elapsed time for each GC worker. This 
>>> includes the overhead of starting and stopping the worker, as well 
>>> as executing the destructors of any objects created in the 
>>> G1ParTask::work() method.
>>>
>>> Both these times (for each GC worker) should be close to 0.0.
>>>
>>> I was thinking of coalescing these into a single time but decided to 
>>> leave them separate as they tell us slightly different things.
>>>
>>> I also changed the order in the PrintGCDetails output to reflect the 
>>> order of the different phases within the parallel code. An example 
>>> of the new format is:
>>>
>>> 112.115: [GC pause (young) (initial-mark), 0.73699803 secs]
>>>    [Parallel Time: 729.9 ms]
>>>       [GC Worker Start Time (ms):  112115.5  112115.5  112115.6  
>>> 112115.6
>>>        Avg: 112115.5, Min: 112115.5, Max: 112115.6, Diff:   0.1]
>>>       [Ext Root Scanning (ms):  2.7  2.7  3.2  3.1
>>>        Avg:   2.9, Min:   2.7, Max:   3.2, Diff:   0.5]
>>>       [Mark Stack Scanning (ms):  0.0  0.0  0.0  0.0
>>>        Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>>>       [Update RS (ms):  7.3  7.5  5.1  5.0
>>>        Avg:   6.2, Min:   5.0, Max:   7.5, Diff:   2.5]
>>>          [Processed Buffers : 4 6 3 4
>>>           Sum: 17, Avg: 4, Min: 3, Max: 6, Diff: 3]
>>>       [Scan RS (ms):  14.0  13.6  15.0  15.8
>>>        Avg:  14.6, Min:  13.6, Max:  15.8, Diff:   2.2]
>>>       [Object Copy (ms):  705.6  705.6  706.1  705.5
>>>        Avg: 705.7, Min: 705.5, Max: 706.1, Diff:   0.6]
>>>       [Termination (ms):  0.0  0.0  0.0  0.0
>>>        Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>>>          [Termination Attempts : 1 1 1 1
>>>           Sum: 4, Avg: 1, Min: 1, Max: 1, Diff: 0]
>>>       [GC Worker End Time (ms):  112845.0  112845.1  112845.1  112845.1
>>>        Avg: 112845.1, Min: 112845.0, Max: 112845.1, Diff:   0.1]
>>>       [GC Worker Times (ms):  729.5  729.5  729.5  729.5
>>>        Avg: 729.5, Min: 729.5, Max: 729.5, Diff:   0.1]
>>>       [GC Worker Other Times (ms):  0.0  0.1  0.1  0.1
>>>        Avg:   0.1, Min:   0.0, Max:   0.1, Diff:   0.1]
>>>       [Parallel Other Times (ms):  0.4  0.4  0.4  0.4
>>>        Avg:   0.4, Min:   0.4, Max:   0.4, Diff:   0.1]
>>>    [Clear CT:   1.1 ms]
>>>    [Other:   6.0 ms]
>>>       [Choose CSet:   0.0 ms]
>>>       [Ref Proc:   0.2 ms]
>>>       [Ref Enq:   0.0 ms]
>>>    [Eden: 1230M(1230M)->0B(1230M) Survivors: 134M->134M Heap: 
>>> 3753M(4096M)->2657M(4096M)]
>>>
>>> Additionally, during RSet scanning, I now stack allocate the 
>>> DirtyCardToOopClosure objects that were being allocated in the GC 
>>> worker's ResourceArea. Executing the destructor of a ResourceMark 
>>> and freeing the spaced allocated in the worker threads' ResourceArea 
>>> objects  was the main contributor to the large "Parallel Other Time".
>>>
>>> Testing: OpenDS on linux and Solaris with 4G, 16G, and 26G heaps; 
>>> the GC test suite, and jprt.
>>>
>>> Thanks,
>>>
>>> JohnC
>>
>