RFR(S): 8173335: Improve logging for j.l.ref.reference processing

Thu Jun 29 07:56:04 UTC 2017

Hi Thomas,

Thank you very much for the thorough review.

On 06/26/2017 07:42 AM, Thomas Schatzl wrote:
> Hi Sangheon,
>
>    thanks for all your changes, and sorry a bit for the delay...
>
> On Wed, 2017-06-14 at 00:52 -0700, sangheon wrote:
>> Hi Thomas again,
>> On 06/13/2017 02:21 PM, sangheon wrote:
>>> Hi Thomas,
>>>
>>> Thank you for reviewing this.
>>>
>>> On 06/13/2017 04:21 AM, Thomas Schatzl wrote:
>>>> Hi Sangheon,
>>>>
>>>>
>>>> On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote:
>>>>> Hi Aleksey,
>>>>>
>>>>> Thanks for the review.
>>>>>
>>>>> On 06/12/2017 09:06 AM, Aleksey Shipilev wrote:
>>>>>> On 06/10/2017 01:57 AM, sangheon wrote:
>>>>>>> CR:https://bugs.openjdk.java.net/browse/JDK-8173335
>>>>>>> webrev:http://cr.openjdk.java.net/~sangheki/8173335/webrev
>>>>>>> .0
>>>> - There should be a destructor in ReferenceProcessor cleaning up
>>>> the dynamically allocated memory.
>>> Thomas and I had some discussion about this and agreed to file a
>>> separate CR for freeing issue.
>>>
>>> I noticed that there's no destructor when I wrote this, but this is
>>> how we usually implement.
>>> However as this seems incorrect, I will add a destructor for newly
>>> added class but it will not be used in this patch.
>>> It will be used in the following CR(
>>> https://bugs.openjdk.java.net/browse/JDK-8182120  ) which fixes
>>> not-freeing issue in ReferenceProcessor.
>>> FYI, ReferenceProcessor has heap allocated members of
>>> ReferencePolicy(and its friends) but it is not freed too. So
>>> instead of extending this patch, I propose to separate this freeing
>>> issue.
> That's fine, thanks.
>
>>>> - the change should move gc+ref output to something else: there
>>>> is so much additional junk printed with gc+ref=trace so that the
>>>> phase logging is drowned out with real trace information and
>>>> unusable for regular consumption.
>>> Okay, I will add it.
>>> But I asked introducing 'gc+ref+phases' before but you didn't like
>>> it. :) Probably I didn't provide much details?!
> Yes. In the example you showed me earlier with gc+ref=trace the
> examples did not contain the other gc+ref=trace output. That's why I
> thought it would be fine. :)
:)

>>>> - I would prefer if resetting the reference phase times logger
>>>> wouldn't be kind of an afterthought of printing :)
>>>>
>>>> Also it might be useful to keep the data around for somewhat
>>>> longer (not throw it away after every phase). Don't we need the
>>>> data for further analysis?
>>> I don't have strong opinion on this.
>>>
>>> I didn't consider keeping log data for further analysis. This could
>>> a minor reason for supporting keeping log data longer but I think
>>> interspersing with existing G1 log would be the main reason of
>>> keeping it.
>>>
>>>> This would also allow printing it later using different log tags
>>>> (with different formatting).
>>>>
>>>> - I like the split of phasetimes into data storage and printing.
>>>> I do not like that basically the timing data is created twice,
>>>> once for the phasetimes, once for the GCTimer (for JFR
>>>> basically). No, currently timing data is created once and used
>>>> for both phase log  and GCTimer.
>>> Or am I missing something?
>>>
>>> So in summary, mostly I agree with your comments except below 2:
>>> 1. Interspersing with G1 log.
>>> 2. Keeping log data longer. (This should be done if we go with
>>> interspersing idea)
>> I started working on above 2 items. :)
>> I will update webrev when I'm ready.
>>
> Thanks a lot for considering all my comments.
>
> I think the output is much nicer now :)
Thanks!

> Some more notes:
>
> - In the current change (webrev.2) the method with using the
> "direct_print()" getter seems a bit forced only to keep the current
> structure of the code, i.e. printing within the
> ReferenceProcessor::process_references() method.
Right.

> What do you think about moving the printing outside of that method for
> all collectors, just passing a (properly initialized - that allows
> moving the reset() method into gc specific code as well)
> ReferenceProcessorPhaseTimes* that is then later used for printing,
> either directly, or deferred?
Okay, this seems better than current one.
During applying your suggestion, I tweaked a little bit, because giving 
the responsibility of printing logs to the callers seems not that 
natural to me. (I also prepared additional webrev for your original 
suggestion [1])

> At the location where the reference processing is done we know whether
> we need to print directly or deferred. This also hides pretty specific
> information about printing (like indentation level) from the reference
> processing itself.
>
> Also that would maybe allow storing the GCTimer reference somewhere in
> the ReferenceProcessorPhaseTimes so that we only need to pass a single
> container for timing information around.
Good idea, now GCTimer is included in ReferenceProcessorPhaseTimes.

> Overall that may reduce the code quite a bit, keeps similar components
> (GCTimer and ReferenceProcessorPhaseTimes) together without
> ReferenceProcessor needing to know about both of them, and removes the
> ReferenceProcessor "global" reference to the
> ReferenceProcessorPhaseTimes, which is easier to keep track of when
> looking at the code (instead of having the GCTimer passed in and the
> ReferenceProcessorPhaseTimes as member).
>
> The collectors that print immediately probably also can get away with a
> stack-allocated local ReferenceProcessorPhaseTimes, which somewhat
> simplifies their lifecycle management.
Right.

Mostly ReferenceProcessorPhaseTimes will be stack-allocated at the time 
of calling process_discovered_references() or 
enqueue_discovered_references(), except for G1 young GC case.

~ReferenceProcessorPhaseTimes() will not be added in the destructor of 
G1CollectedHeap as we don't have it now. This can be addressed in 
separate CR if needed.

> - could you please tighten the visibility of
> ReferenceProcessorPhaseTimes methods a bit? The getters of that class
> are only ever used in the print* methods, and even some of these print*
> methods are ever called from class local methods.
>
> I think this would drastically decrease the surface of that method.
You are right.
Tried to move to 'private' as many as possible.

> - there seems to be a bug in printing per-thread per-phase worker
> times, the values seem to contain the absolute time at which the list
> has been processed, not a duration. (with -XX:+ParallelRefProcEnabled
> and gc+phases+ref=trace)
>
> [1512.286s][debug][gc,phases,ref] GC(834) Reference Processing: 2.5ms
> [1512.286s][debug][gc,phases,ref] GC(834)   SoftReference: 0.3ms
> [1512.286s][debug][gc,phases,ref] GC(834)     Balance queues: 0.0ms
> [1512.286s][debug][gc,phases,ref] GC(834)     Phase1: 0.3ms
> [1512.286s][trace][gc,phases,ref] GC(834)       Process lists
> (ms)        Min: 1512283.9, Avg: 1512283.9, Max: 1512283.9, Diff:  0.0,
> Sum: 34782529.1, Workers: 23
> [1512.286s][debug][gc,phases,ref] GC(834)     Phase2: 0.3ms
> [1512.286s][trace][gc,phases,ref] GC(834)       Process lists
> (ms)        Min: 1512284.2, Avg: 1512284.2, Max: 1512284.2, Diff:  0.0,
> Sum: 34782535.9, Workers: 23
>
> - in referenceProcessorPhaseTimes.cpp:35: the code reads
>
> if (_worker_time != NULL) {
>    ...
> }
>
> with _worker_time being set to NULL just one line above (same with the
> other constructor).
>
> Not sure.
_worker_time check was remainder of previous change and resulted in 
above bug you pointed.
Thanks for catching this.
Fixed.

> - in RefProcWorkerTimeTracker::~RefProcWorkerTimeTracker: how is it
> possible that _worker_time is NULL? ReferenceProcessorPhaseTimes seems
> to always allocate memory for it.
Fixed.
_worker_time can't be NULL.

> - RefProcPhaseTimesTracker takes the DiscoveredList array as parameter,
> but only ever uses it to determine how many total entries this
> DiscoveredList[] has. So it seems to me that it would be better in the
> name of information hiding if the ReferenceProcessor, which already has
> a total_count() method, would just pass this total instead of the
> entire list.
The problem is that 'before/after' count should be gathered from 
constructor and destructor.
With passing a parameter, constructor could get the total but it is 
impossible from destructor.

But I agree with your point that passing DiscoveredList to get the 
total-count can be enhanced. So I changed to add a new method that 
returns total count at ReferenceProcessor(per ReferenceType). With this 
new approach, we can simplify a bit more. e.g. eliminate 
ReferenceProcessorPhaseTimes::max_gc_threads() and 
total_count_from_list() etc.

> This would also remove the need for the max_gc_counts() getter in
> ReferenceProcessorPhaseTimes afaics too.
[...]

> - "Ref Counts" vs. "Reference Counts" vs. something else in the output
> of the enqueue phase: I would prefer to not use abbreviations. Since we
> already mess up the logging output in a big way, we might also just go
> all the way :P
Changed to use 'Reference Counts'.

Updated webrev:
http://cr.openjdk.java.net/~sangheki/8173335/webrev.3
http://cr.openjdk.java.net/~sangheki/8173335/webrev.3_to_2/

Testing: JPRT and local test with all collectors.

[1]: with your suggestion, callers will stack allocate 
ReferenceProcessorPhaseTimes and GCTimer can be included in 
ReferenceProcessorPhaseTimes. i.e. applied only your suggestion
http://cr.openjdk.java.net/~sangheki/8173335/webrev.3b
http://cr.openjdk.java.net/~sangheki/8173335/webrev.3b_to_2/

Thanks,
Sangheon

> Thanks,
>    Thomas
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170629/26e6f367/attachment.htm>