RFR (M): JDK-8043607: Add a GC id as a log decoration similar to PrintGCTimeStamps

Jesper Wilhelmsson jesper.wilhelmsson at oracle.com
Wed May 28 12:42:51 UTC 2014


I'm fine with this change.

There is a typo present in both new comments in genCollectedHeap.cpp, "thet" 
should be "the".

/Jesper


Bengt Rutisson skrev 23/5/14 07:00:
>
> Hi Jesper and everyone else,
>
> After explaining the problem with GenCollectedHeap in my last email I realized
> that a different approach would be to just assume that the next GC id will be
> the right one when we start the logging before we have incremented the GC id. I
> think that is a nicer way of doing it than "wasting" a GC id as I did before.
>
> Here's an updated webrev with that changed:
> http://cr.openjdk.java.net/~brutisso/8043607/webrev.01/
>
> Here's the diff compared to the last webrev:
> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00-01-diff/
>
> Thanks,
> Bengt
>
> On 5/22/14 9:40 AM, Bengt Rutisson wrote:
>>
>> Hi Jesper,
>>
>> On 5/21/14 7:44 PM, Jesper Wilhelmsson wrote:
>>> Hi Bengt,
>>>
>>> Good change! Thanks for doing this!
>>
>> Thanks for looking at it.
>>
>>>
>>> A few thoughts:
>>>
>>> * There are a few places that passes both a GCTimer and now a GCId. Would it
>>> make sense to pass an info-object of some kind that contains misc info that
>>> is used for logging/tracing? I'm not suggesting that you should implement
>>> that in this change, just a thought for the future.
>>
>> Yes, I think that may be a good future cleanup to do. But would be good as a
>> separate change, just as you point out.
>>
>>>
>>>
>>> * In genCollectdHeap.cpp you increase the gc id only if PrintGCDetails is
>>> enabled. I assume this is the only place where a gc id is used when running
>>> collectors that use GenCollectedHeap::do_collection(). If not, the id created
>>> and used in other places would be different depending on if PrintGCDetails
>>> was enabled or not.
>>
>> Yes, this is a work around for the strange way the logging for
>> GenCollectedHeap works. We do completely different timing and logging for
>> PrintGC and PrintGCDetails. That the logging is different is just strange.
>> That the the timing is different is in my opinion a bug. The pause time
>> reported is shorter if you run with PrintGC than if you run with PrintGCDetails.
>>
>> This mostly affect CMS. The logging for CMS has several other issues as well.
>> I added one comment about that in the bug report. So, I think it would be good
>> to give the CMS logging an overhaul, but that should definitely be a separate
>> change :) And maybe we should hold off with that until the unified logging
>> framework is in place.
>>
>>>
>>> Anyway, looks good!
>> Thanks!
>>
>> Bengt
>>
>>> Ship it!
>>> /Jesper
>>>
>>>
>>> Bengt Rutisson skrev 21/5/14 11:31:
>>>>
>>>> Hi everyone,
>>>>
>>>> Can I have some reviews for this patch:
>>>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00/
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8043607
>>>>
>>>> This is mostly a GC change but I am sending it to the broader list since I
>>>> added
>>>> a method to ostream.hpp/cpp. We had this code duplicated in many places in the
>>>> GC code:
>>>>
>>>> gclog_or_tty->date_stamp(PrintGCDateStamps);
>>>> gclog_or_tty->stamp(PrintGCTimeStamps);
>>>>
>>>> And with my patch I had to add one more section to this code duplication.
>>>> Instead I wrapped all three calls in a method called gclog_stamp(). It is a bit
>>>> hard to know where to place that method. ostream seems like the wrong place for
>>>> GC specific code, but there is already lots of GC specific code there and even
>>>> gclog_or_tty is declared there. So I added the method there for now:
>>>>
>>>> void outputStream::gclog_stamp(uint gc_id) {
>>>>    date_stamp(PrintGCDateStamps);
>>>>    stamp(PrintGCTimeStamps);
>>>>    if (PrintGCID) {
>>>>      print("#%u: ", gc_id);
>>>>    }
>>>> }
>>>>
>>>> Other than that this is a pure GC change.
>>>>
>>>>
>>>> Background:
>>>>
>>>> With event based tracing we introduced a GC id field on all GC events. This id
>>>> is a constantly increasing number associated with each GC. I figured we can use
>>>> the this GC id as a decoration for the HotSpot GC logging similarly to what we
>>>> do with PrintGCTimeStamps and PrintGCDateStamps.
>>>>
>>>> A nice side effect of this is that it will make it easy to map an event in the
>>>> event tracing to an entry in the GC log. Currently this is impossible to do
>>>> deterministically since the logging and event tracing use different timing.
>>>>
>>>> I propose to add a flag called PrintGCID. Here's an example log entry for GC
>>>> number 4 running with -XX:+PrintGCID:
>>>>
>>>> #4: [GC pause  (G1 Evacuation Pause) (young) 185M->212M(502M), 0,2373420 secs]
>>>>
>>>> It is a decoration similar to the timestamps. Here's what it looks like if you
>>>> add -XX:+PrintGCTimeStamps:
>>>>
>>>> 3,921: #4: [GC pause  (G1 Evacuation Pause) (young) 185M->212M(502M), 0,2373420
>>>> secs]
>>>>
>>>> And here's what it will look like if you add -XX:+PrintGCDateStamps:
>>>>
>>>> 2014-05-09T13:03:59.384+0200: 3,921: #4: [GC pause  (G1 Evacuation Pause)
>>>> (young) 185M->212M(502M), 0,2373420 secs]
>>>>
>>>>
>>>> Here's a little longer example running ParallelGC:
>>>>
>>>> #0: [GC (Allocation Failure)  129024K->2980K(493056K), 0,1050244 secs]
>>>> #1: [GC (Allocation Failure)  132004K->3731K(622080K), 0,0389474 secs]
>>>> #2: [GC (Allocation Failure)  261779K->27153K(622080K), 0,2057294 secs]
>>>> #3: [GC (Allocation Failure)  285201K->224350K(880128K), 0,2410144 secs]
>>>> #4: [Full GC (Ergonomics)  224350K->219198K(1154560K), 2,7104481 secs]
>>>> #5: [GC (Allocation Failure)  735294K->638073K(863744K), 0,7707819 secs]
>>>>
>>>> Note that for the concurrent GCs this gives a good way of seeing the concurrent
>>>> action. Here is a G1 example:
>>>>
>>>> #0: [GC pause  (G1 Evacuation Pause) (young) 24M->2881K(502M), 0,0383157 secs]
>>>> #1: [GC pause  (G1 Evacuation Pause) (young) 48M->4745K(502M), 0,0412220 secs]
>>>> #2: [GC pause  (G1 Evacuation Pause) (young) 136M->9618K(502M), 0,0426144 secs]
>>>> #3: [GC pause  (G1 Evacuation Pause) (young) 248M->143M(502M), 0,1467633 secs]
>>>> #4: [GC pause  (G1 Evacuation Pause) (young) 210M->199M(502M), 0,1627465 secs]
>>>> #5: [GC pause  (G1 Evacuation Pause) (young) 213M->235M(502M), 0,2376728 secs]
>>>> #6: [GC pause  (G1 Evacuation Pause) (young) (initial-mark) 255M->269M(1004M),
>>>> 0,2623779 secs]
>>>> #7: [GC concurrent-root-region-scan-start]
>>>> #7: [GC concurrent-root-region-scan-end, 0,0158488 secs]
>>>> #7: [GC concurrent-mark-start]
>>>> #8: [GC pause  (G1 Evacuation Pause) (young) 315M->330M(2008M), 0,3007405 secs]
>>>> #9: [GC pause  (G1 Evacuation Pause) (young) 422M->418M(3212M), 0,4063937 secs]
>>>> #10: [GC pause  (G1 Evacuation Pause) (young) 564M->572M(4176M), 0,7500609
>>>> secs]
>>>> #11: [GC pause  (G1 Evacuation Pause) (young) 760M->756M(4946M), 1,5464884
>>>> secs]
>>>> #7: [GC concurrent-mark-end, 3,9464599 secs]
>>>> #7: [GC remark, 0,0240462 secs]
>>>> #7: [GC cleanup 801M->800M(4946M), 0,0108146 secs]
>>>> #7: [GC concurrent-cleanup-start]
>>>> #7: [GC concurrent-cleanup-end, 0,0000228 secs]
>>>> #12: [GC pause  (G1 Evacuation Pause) (young) 977M->978M(5562M), 2,2464423
>>>> secs]
>>>>
>>>> Notice how the complete concurrent cycle is decorated with id #7. That makes it
>>>> possible to follow the concurrent work more easily and grep it out etc.
>>>>
>>>>
>>>> Here's a similar thing for CMS where the concurrent cycle #22 has a young GC
>>>> #23
>>>> in the middle of it:
>>>>
>>>> #21: [GC (Allocation Failure) 1621551K->1107977K(2395632K), 1,4558792 secs]
>>>> #22: [GC (CMS Initial Mark)  1336515K(2452584K), 0,1107938 secs]
>>>> #23: [GC (Allocation Failure) 1768969K->1184500K(2452584K), 0,7909961 secs]
>>>> #22: [GC (CMS Final Remark)  1304871K(2452584K), 0,2353840 secs]
>>>>
>>>>
>>>> I think this is very helpful information to have, so I would like to push this
>>>> with the new flag PrintGCID as true by default. I'm sure some parsers will
>>>> break, but it is an easy work around to run with -XX:-PrintGCID to get the old
>>>> format. It should even be pretty easy to do post processing on a log file to
>>>> remove the GC id information. Having it on by default makes it possible for us
>>>> to use this information and in particular be able to map it to the JFR
>>>> events we
>>>> see in MissionControl.
>>>>
>>>> I would also like to backport this to 8u40, but in that case with the default
>>>> value of PrintGCID as false.
>>>>
>>>> Thanks,
>>>> Bengt
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>


More information about the hotspot-dev mailing list