RFR(S): 7096030: G1: PrintGCDetails enhancements
Bengt Rutisson
bengt.rutisson at oracle.com
Tue Oct 18 08:00:10 UTC 2011
John,
Thanks for cleaning this up. Your changes is definitely a step in the
right direction.
Some comments/questions:
The CR is called "7096030 G1: PrintGCDetails enhancements", but the
destructor improvement is actually unrelated to how the GC details are
printed. I think it is an important change and worth keeping track of.
How do we normally do that? Is it enough to mention it in the summary of
of your commit? Would it be ok to file a new CR and close both with your
push? Or should we even push the changes separately?
g1RemSet.hpp
Just for my own education: I see in your CR updates that the cause of
the long other times were contention on the runtime locks. But when I
look at ResourceMark I don't see it using any locks. At least not in
product builds. Which locks got contended?
(Just for the record - I know this is nothing you introduced - but I
think the acronym DCTOC is terrible. I have no chance remembering what
it means. I know now that it is the short form for
DirtyCardToOopClosure, but it takes some mental overhead to figure that
out each time.)
g1CollectorPolicy.cpp
record_collection_pause_end()
The parallel_known_time and parallel_other_time are only used inside if
(parallel), so I would suggest moving the calculations to set them up
inside the if as well. Also, parallel_known_time is very similar to
other_time_ms for the non-parallel case. Since parallel_known_time is
just a partial result to be used to calculate parallel_other_time it
could maybe be refactored to be exactly the same as the non-parallel
other_time_ms. This would save some code duplication.
The _par_last_gc_worker_other_times_ms[] array is a bit different
compared to the other _par_last*[] arrays in that it does not keep track
of its data. The data is calculated on the spot where we need it in
record_collection_pause_end(). So, I would suggest to make
_par_last_gc_worker_other_times_ms[] a local variable in that method.
Thanks for being consistent with the pause ordering everywhere. Makes it
very easy to read the code when the pause parts are always handled in
the same order.
Since we have been discussing the scope of code cleanup changes, I would
just like to mention that I like the changes you did to
G1CollectorPolicy::print_summary()
G1CollectorPolicy::print_summary_sd(). To me this is the type of change
that feels natural. You are changing a lot of code in this file and
these methods are not following the same white space standard as the
rest of the file. However, why not go all the way and make the file
consistent by doing the same change to G1CollectorPolicy::avg_value(),
G1CollectorPolicy::max_value(), G1CollectorPolicy::sum_of_values() and
G1CollectorPolicy::max_sum() ?
Bengt
On 2011-10-18 00:20, John Cuthbertson wrote:
> Hi Everyone,
>
> A new webrev for this CR can be found at:
> http://cr.openjdk.java.net/~johnc/7096030/webrev.2/ based upon some
> additional feedback.
>
> Changes in this webrev include:
> * Some minor reorganization in
> G1CollectorPolicy::record_collection_end().
> * Removal of the _satb_drain_time_set flag.
> * In addition to the SATB Drain Time and the # of processed SATB
> buffers, output of Mark Stack Scanning is now dependent on whether
> marking is active.
> * Modified the TraceGen0Time output to be ordered consistently with
> PrintGCDetails.
>
> Testing: specjvm98 and OpenDS; manually checked that the TraceGen0Time
> and TraceGen1Time output values are consistent with summarizing the
> pause data.
>
> Thanks,
>
> JohnC
>
> On 10/14/11 10:17, John Cuthbertson wrote:
>> Hi Everyone,
>>
>> I have a new webrev for this change based upon some initial feedback,
>> which can be found here:
>> http://cr.openjdk.java.net/~johnc/7096030/webrev.1/
>>
>> Changes in this webrev include:
>> * Coalescing the GC Worker Other time and Parallel Other Time into a
>> single "GC Worker Other" time.
>> * Modifying some of the tags in the PrintGCDetails output
>> * The order of the phases in the PrintGCDetails when
>> ParallelGCThreads==0 now matches that of the parallel case.
>>
>> New output:
>>
>> 151.550: [GC pause (young), 0.57693560 secs]
>> [SATB Drain Time: 0.0 ms]
>> [Parallel Time: 553.7 ms]
>> [GC Worker Start (ms): 151550.7 151550.7 151550.7 151551.3
>> Avg: 151550.9, Min: 151550.7, Max: 151551.3, Diff: 0.6]
>> [Ext Root Scanning (ms): 2.8 2.8 3.0 2.6
>> Avg: 2.8, Min: 2.6, Max: 3.0, Diff: 0.4]
>> [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0
>> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0]
>> [Update RS (ms): 5.1 5.6 3.9 5.3
>> Avg: 5.0, Min: 3.9, Max: 5.6, Diff: 1.7]
>> [Processed Buffers : 3 3 6 5
>> Sum: 17, Avg: 4, Min: 3, Max: 6, Diff: 3]
>> [Scan RS (ms): 14.0 13.7 14.8 13.3
>> Avg: 13.9, Min: 13.3, Max: 14.8, Diff: 1.5]
>> [Object Copy (ms): 531.4 531.2 531.5 531.5
>> Avg: 531.4, Min: 531.2, Max: 531.5, Diff: 0.3]
>> [Termination (ms): 0.0 0.0 0.0 0.0
>> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0]
>> [Termination Attempts : 1 1 1 1
>> Sum: 4, Avg: 1, Min: 1, Max: 1, Diff: 0]
>> [GC Worker End (ms): 152104.1 152104.1 152104.0 152104.0
>> Avg: 152104.1, Min: 152104.0, Max: 152104.1, Diff: 0.1]
>> [GC Worker (ms): 553.4 553.4 553.3 552.7
>> Avg: 553.2, Min: 552.7, Max: 553.4, Diff: 0.7]
>> [GC Worker Other (ms): 0.4 0.4 0.5 1.0
>> Avg: 0.6, Min: 0.4, Max: 1.0, Diff: 0.6]
>> [Clear CT: 1.4 ms]
>> [Other: 21.8 ms]
>> [Choose CSet: 0.0 ms]
>> [Ref Proc: 2.8 ms]
>> [Ref Enq: 0.0 ms]
>> [Eden: 1230M(1230M)->0B(1230M) Survivors: 134M->134M Heap:
>> 3312M(4096M)->2216M(4096M)]
>> [Times: user=0.77 sys=0.61, real=0.58 secs]
>>
>> Thanks,
>>
>> JohnC
>>
>> On 10/12/11 09:55, John Cuthbertson wrote:
>>> Hi Everyone,
>>>
>>> Can I have a a couple of volunteers review these changes? The webrev
>>> can be found at: http://cr.openjdk.java.net/~johnc/7096030/webrev.0/
>>>
>>> Description:
>>> While looking at the PrintGCDetails output from running some large
>>> applications, it was noticed that there seemed to be a large amount
>>> of time listed as "Parallel Other Time" which couldn't be accounted
>>> for. To account for this I have replaced the single overall
>>> "Parallel Other Time" with two times that is reported for each GC
>>> worker thread:
>>>
>>> GC Worker Other Times
>>> This is the unaccounted time for each GC worker between its start
>>> and end times. It is calculated as "GC Worker Time" minus the sum of
>>> the times for that worker given for "Ext. Root Scanning", "Mark
>>> Stack Scanning", "RS Update", "RS Scanning", "Object Copying", and
>>> "Termination".
>>>
>>> Parallel Other Times
>>> This is the difference between the overall elapsed time for the
>>> parallel phase and the elapsed time for each GC worker. This
>>> includes the overhead of starting and stopping the worker, as well
>>> as executing the destructors of any objects created in the
>>> G1ParTask::work() method.
>>>
>>> Both these times (for each GC worker) should be close to 0.0.
>>>
>>> I was thinking of coalescing these into a single time but decided to
>>> leave them separate as they tell us slightly different things.
>>>
>>> I also changed the order in the PrintGCDetails output to reflect the
>>> order of the different phases within the parallel code. An example
>>> of the new format is:
>>>
>>> 112.115: [GC pause (young) (initial-mark), 0.73699803 secs]
>>> [Parallel Time: 729.9 ms]
>>> [GC Worker Start Time (ms): 112115.5 112115.5 112115.6
>>> 112115.6
>>> Avg: 112115.5, Min: 112115.5, Max: 112115.6, Diff: 0.1]
>>> [Ext Root Scanning (ms): 2.7 2.7 3.2 3.1
>>> Avg: 2.9, Min: 2.7, Max: 3.2, Diff: 0.5]
>>> [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0
>>> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0]
>>> [Update RS (ms): 7.3 7.5 5.1 5.0
>>> Avg: 6.2, Min: 5.0, Max: 7.5, Diff: 2.5]
>>> [Processed Buffers : 4 6 3 4
>>> Sum: 17, Avg: 4, Min: 3, Max: 6, Diff: 3]
>>> [Scan RS (ms): 14.0 13.6 15.0 15.8
>>> Avg: 14.6, Min: 13.6, Max: 15.8, Diff: 2.2]
>>> [Object Copy (ms): 705.6 705.6 706.1 705.5
>>> Avg: 705.7, Min: 705.5, Max: 706.1, Diff: 0.6]
>>> [Termination (ms): 0.0 0.0 0.0 0.0
>>> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0]
>>> [Termination Attempts : 1 1 1 1
>>> Sum: 4, Avg: 1, Min: 1, Max: 1, Diff: 0]
>>> [GC Worker End Time (ms): 112845.0 112845.1 112845.1 112845.1
>>> Avg: 112845.1, Min: 112845.0, Max: 112845.1, Diff: 0.1]
>>> [GC Worker Times (ms): 729.5 729.5 729.5 729.5
>>> Avg: 729.5, Min: 729.5, Max: 729.5, Diff: 0.1]
>>> [GC Worker Other Times (ms): 0.0 0.1 0.1 0.1
>>> Avg: 0.1, Min: 0.0, Max: 0.1, Diff: 0.1]
>>> [Parallel Other Times (ms): 0.4 0.4 0.4 0.4
>>> Avg: 0.4, Min: 0.4, Max: 0.4, Diff: 0.1]
>>> [Clear CT: 1.1 ms]
>>> [Other: 6.0 ms]
>>> [Choose CSet: 0.0 ms]
>>> [Ref Proc: 0.2 ms]
>>> [Ref Enq: 0.0 ms]
>>> [Eden: 1230M(1230M)->0B(1230M) Survivors: 134M->134M Heap:
>>> 3753M(4096M)->2657M(4096M)]
>>>
>>> Additionally, during RSet scanning, I now stack allocate the
>>> DirtyCardToOopClosure objects that were being allocated in the GC
>>> worker's ResourceArea. Executing the destructor of a ResourceMark
>>> and freeing the spaced allocated in the worker threads' ResourceArea
>>> objects was the main contributor to the large "Parallel Other Time".
>>>
>>> Testing: OpenDS on linux and Solaris with 4G, 16G, and 26G heaps;
>>> the GC test suite, and jprt.
>>>
>>> Thanks,
>>>
>>> JohnC
>>
>
More information about the hotspot-gc-dev
mailing list