8214155: GCHeapLog inconsistents with CollectorCounters

Yasumasa Suenaga yasuenag at gmail.com
Thu Nov 22 14:02:45 UTC 2018


Hi Thomas,

I erroneously assumed that `full` value in hs_err GC events means # of STWs.
I will withdraw this CR.

> - the sun.gc.collector.2.name label "G1 stop-the-world phases" seems to
> be just wrong. Looking at
> src/jdk.jcmd/share/classes/sun/tools/jstat/resources/jstat_options, the
> comment for sun.gc.collector.2.* and the actual code indicates that
> this should actually be called something like "G1 concurrent (stop-the-
> world) phases".

I agree with you.
Should I file this to JBS?


> - not sure it is useful to notify about remark and cleanup pause
> separately in the sun.gc.collector.2 statistics - isn't it sufficient
> to have a concurrent cycles count there? (idk about any dependencies).

If sun.gc.collector.2 is not available, first STW phase will not be counted when concurrent GC cycle is aborted: concurrent mode failure.


> - some recent change might have introduced an issue: looking at
> g1ConcurrentMark.cpp:1359 where the "total_collection" count is
> increased there is some comment about races which I do not understand.
> However that count might be more appropriately increased in the Cleanup
> pause. Or not at all there, but rather at the end of the concurrent
> cycle (and isn't this a kind of "full" collection?).
> 
> What would be the compatibility implications of all that?

G1ConcurrentMark::cleanup() seems to be called at safepoint. It has assertion for safepoint. So I guess it will not race.


Thanks,

Yasumasa


On 2018/11/22 18:59, Thomas Schatzl wrote:
> Hi Yasumasa,
> 
> On Wed, 2018-11-21 at 21:58 +0900, Yasumasa Suenaga wrote:
>> Hi all,
>>
>> We can see number of GC events via jcmd VM.info and
>> PerfCounter.print. However they hold different values.
>> (See JDK-8214155 on JBS [1] for more details)
> 
>    during bug triage (making sure bugs are understandable and in general
> ready to be worked on) we had several questions with the description of
> the problem - you might have noticed slight editing already.
> 
> In particular it is unclear to us what the problem and scope of this
> issue is.
> 
> So from what I understand (and there has already been some discussion
> even about this :)), the issue is that the number of GC events
> ("invocations") in the hs_err log does not match the number of jstat
> events.
> 
> It seems that there is something wrong here, but maybe it is not
> (only?) an issue of wrong functionality, but missing expected
> functionality.
> 
> Note that since this functionality is ages-old, none of us in the
> Oracle gc team can give you answers why the output is the way it is -
> probably just a mixture of "just forgot that functionality" (for this
> one), "nobody cared until now" and similar reasons.
> 
> So if we are interested in making the output a bit more useful we might
> want to take a step back and look at a slightly larger context.
> 
> So the first observations just when looking at the output given in the
> description are:
> 
> - the sun.gc.collector.2.name label "G1 stop-the-world phases" seems to
> be just wrong. Looking at
> src/jdk.jcmd/share/classes/sun/tools/jstat/resources/jstat_options, the
> comment for sun.gc.collector.2.* and the actual code indicates that
> this should actually be called something like "G1 concurrent (stop-the-
> world) phases".
> 
> - the description in the CR does not mention why you expect that the
> event counts in the hs_err file should match the jshdb output, and what
> it is used for.
> 
> To my understanding, the "GC events" in the hs_err log print *heap
> sizes* before and after actual collections at the moment. I.e. this is
> log tracing heap changes, not a GC pause log.
> 
> Some digging showed that the given count ("36") is derived from the
> number of total collections that modify the heap; in G1 that is
> incremented during incremental collections, full gc *and* the remark
> pause (because during cleanup the heap does not change, but I think the
> intent is to capture the concurrent cycle as a single "concurrent
> collection").
> 
> This is the reason why the count is 36 and not 52. (i.e. 18 + 2 + 32 /
> 2)
> 
> I do not mind changing all this, but all this needs to be considered
> when reporting issues to make it clear what the problem is and give
> everyone a heads up on what the problem to be solved here. Particularly
> the CR misses reasoning why the "invocations" count should be 52 here
> (remember, this is counting heap changes, not GC events).
> 
> [...]
>> I believe this issue should be fixed for troubleshooters. Of course,
>> they can use `jhsdb jsnap` to check the counters. But I guess most of
>> them believe contents in hs_err log.
> 
> Looking at the problem it might be useful to understand what these
> troubleshooters need.
> 
> My opinion as basis for further discussion: the current "last gc
> events" is almost useless for me at the moment, as it misses some
> crucial information:
> 
> - what kind of pause this event is looking at
> 
> - not sure it is useful to notify about remark and cleanup pause
> separately in the sun.gc.collector.2 statistics - isn't it sufficient
> to have a concurrent cycles count there? (idk about any dependencies).
> 
> - maybe split up the "invocations" (which should be renamed as it is
> confusing) into counts for the respective pauses (i.e. incremental=XX
> full=YY remark=ZZ cleanup=AA). Something like "pauses" would be
> appropriate.
> 
> - (given above investigation it would really be nice to have the
> Cleanup pause included in the count if we do not think just "concurrent
> cycles" is good enough)
> 
> - maybe do not make the text so focused on heap sizes (but still
> include it). I can imagine the text can be made a lot more useful than
> it is now with some input of the community.
> 
> - some recent change might have introduced an issue: looking at
> g1ConcurrentMark.cpp:1359 where the "total_collection" count is
> increased there is some comment about races which I do not understand.
> However that count might be more appropriately increased in the Cleanup
> pause. Or not at all there, but rather at the end of the concurrent
> cycle (and isn't this a kind of "full" collection?).
> 
> What would be the compatibility implications of all that?
> 
>>
>> Do you have any idea for this issue?
> 
> Before proceeding I would like to understand the issue and in
> particular the proposed solution, and if you have any further goals in
> this area or if this is everything you want to do, a bit better.
> 
> As mentioned, only fixing the count in the gc event won't help me much
> as "troubleshooter" though.
> 
> Thanks,
>    Thomas
> 
> 



More information about the hotspot-gc-dev mailing list