RFR (M): JDK-8043607: Add a GC id as a log decoration similar to PrintGCTimeStamps

Bengt Rutisson bengt.rutisson at oracle.com
Fri May 23 05:00:44 UTC 2014


Hi Jesper and everyone else,

After explaining the problem with GenCollectedHeap in my last email I 
realized that a different approach would be to just assume that the next 
GC id will be the right one when we start the logging before we have 
incremented the GC id. I think that is a nicer way of doing it than 
"wasting" a GC id as I did before.

Here's an updated webrev with that changed:
http://cr.openjdk.java.net/~brutisso/8043607/webrev.01/

Here's the diff compared to the last webrev:
http://cr.openjdk.java.net/~brutisso/8043607/webrev.00-01-diff/

Thanks,
Bengt

On 5/22/14 9:40 AM, Bengt Rutisson wrote:
>
> Hi Jesper,
>
> On 5/21/14 7:44 PM, Jesper Wilhelmsson wrote:
>> Hi Bengt,
>>
>> Good change! Thanks for doing this!
>
> Thanks for looking at it.
>
>>
>> A few thoughts:
>>
>> * There are a few places that passes both a GCTimer and now a GCId. 
>> Would it make sense to pass an info-object of some kind that contains 
>> misc info that is used for logging/tracing? I'm not suggesting that 
>> you should implement that in this change, just a thought for the future.
>
> Yes, I think that may be a good future cleanup to do. But would be 
> good as a separate change, just as you point out.
>
>>
>>
>> * In genCollectdHeap.cpp you increase the gc id only if 
>> PrintGCDetails is enabled. I assume this is the only place where a gc 
>> id is used when running collectors that use 
>> GenCollectedHeap::do_collection(). If not, the id created and used in 
>> other places would be different depending on if PrintGCDetails was 
>> enabled or not.
>
> Yes, this is a work around for the strange way the logging for 
> GenCollectedHeap works. We do completely different timing and logging 
> for PrintGC and PrintGCDetails. That the logging is different is just 
> strange. That the the timing is different is in my opinion a bug. The 
> pause time reported is shorter if you run with PrintGC than if you run 
> with PrintGCDetails.
>
> This mostly affect CMS. The logging for CMS has several other issues 
> as well. I added one comment about that in the bug report. So, I think 
> it would be good to give the CMS logging an overhaul, but that should 
> definitely be a separate change :) And maybe we should hold off with 
> that until the unified logging framework is in place.
>
>>
>> Anyway, looks good!
> Thanks!
>
> Bengt
>
>> Ship it!
>> /Jesper
>>
>>
>> Bengt Rutisson skrev 21/5/14 11:31:
>>>
>>> Hi everyone,
>>>
>>> Can I have some reviews for this patch:
>>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00/
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8043607
>>>
>>> This is mostly a GC change but I am sending it to the broader list 
>>> since I added
>>> a method to ostream.hpp/cpp. We had this code duplicated in many 
>>> places in the
>>> GC code:
>>>
>>> gclog_or_tty->date_stamp(PrintGCDateStamps);
>>> gclog_or_tty->stamp(PrintGCTimeStamps);
>>>
>>> And with my patch I had to add one more section to this code 
>>> duplication.
>>> Instead I wrapped all three calls in a method called gclog_stamp(). 
>>> It is a bit
>>> hard to know where to place that method. ostream seems like the 
>>> wrong place for
>>> GC specific code, but there is already lots of GC specific code 
>>> there and even
>>> gclog_or_tty is declared there. So I added the method there for now:
>>>
>>> void outputStream::gclog_stamp(uint gc_id) {
>>>    date_stamp(PrintGCDateStamps);
>>>    stamp(PrintGCTimeStamps);
>>>    if (PrintGCID) {
>>>      print("#%u: ", gc_id);
>>>    }
>>> }
>>>
>>> Other than that this is a pure GC change.
>>>
>>>
>>> Background:
>>>
>>> With event based tracing we introduced a GC id field on all GC 
>>> events. This id
>>> is a constantly increasing number associated with each GC. I figured 
>>> we can use
>>> the this GC id as a decoration for the HotSpot GC logging similarly 
>>> to what we
>>> do with PrintGCTimeStamps and PrintGCDateStamps.
>>>
>>> A nice side effect of this is that it will make it easy to map an 
>>> event in the
>>> event tracing to an entry in the GC log. Currently this is 
>>> impossible to do
>>> deterministically since the logging and event tracing use different 
>>> timing.
>>>
>>> I propose to add a flag called PrintGCID. Here's an example log 
>>> entry for GC
>>> number 4 running with -XX:+PrintGCID:
>>>
>>> #4: [GC pause  (G1 Evacuation Pause) (young) 185M->212M(502M), 
>>> 0,2373420 secs]
>>>
>>> It is a decoration similar to the timestamps. Here's what it looks 
>>> like if you
>>> add -XX:+PrintGCTimeStamps:
>>>
>>> 3,921: #4: [GC pause  (G1 Evacuation Pause) (young) 
>>> 185M->212M(502M), 0,2373420
>>> secs]
>>>
>>> And here's what it will look like if you add -XX:+PrintGCDateStamps:
>>>
>>> 2014-05-09T13:03:59.384+0200: 3,921: #4: [GC pause  (G1 Evacuation 
>>> Pause)
>>> (young) 185M->212M(502M), 0,2373420 secs]
>>>
>>>
>>> Here's a little longer example running ParallelGC:
>>>
>>> #0: [GC (Allocation Failure)  129024K->2980K(493056K), 0,1050244 secs]
>>> #1: [GC (Allocation Failure)  132004K->3731K(622080K), 0,0389474 secs]
>>> #2: [GC (Allocation Failure)  261779K->27153K(622080K), 0,2057294 secs]
>>> #3: [GC (Allocation Failure)  285201K->224350K(880128K), 0,2410144 
>>> secs]
>>> #4: [Full GC (Ergonomics)  224350K->219198K(1154560K), 2,7104481 secs]
>>> #5: [GC (Allocation Failure)  735294K->638073K(863744K), 0,7707819 
>>> secs]
>>>
>>> Note that for the concurrent GCs this gives a good way of seeing the 
>>> concurrent
>>> action. Here is a G1 example:
>>>
>>> #0: [GC pause  (G1 Evacuation Pause) (young) 24M->2881K(502M), 
>>> 0,0383157 secs]
>>> #1: [GC pause  (G1 Evacuation Pause) (young) 48M->4745K(502M), 
>>> 0,0412220 secs]
>>> #2: [GC pause  (G1 Evacuation Pause) (young) 136M->9618K(502M), 
>>> 0,0426144 secs]
>>> #3: [GC pause  (G1 Evacuation Pause) (young) 248M->143M(502M), 
>>> 0,1467633 secs]
>>> #4: [GC pause  (G1 Evacuation Pause) (young) 210M->199M(502M), 
>>> 0,1627465 secs]
>>> #5: [GC pause  (G1 Evacuation Pause) (young) 213M->235M(502M), 
>>> 0,2376728 secs]
>>> #6: [GC pause  (G1 Evacuation Pause) (young) (initial-mark) 
>>> 255M->269M(1004M),
>>> 0,2623779 secs]
>>> #7: [GC concurrent-root-region-scan-start]
>>> #7: [GC concurrent-root-region-scan-end, 0,0158488 secs]
>>> #7: [GC concurrent-mark-start]
>>> #8: [GC pause  (G1 Evacuation Pause) (young) 315M->330M(2008M), 
>>> 0,3007405 secs]
>>> #9: [GC pause  (G1 Evacuation Pause) (young) 422M->418M(3212M), 
>>> 0,4063937 secs]
>>> #10: [GC pause  (G1 Evacuation Pause) (young) 564M->572M(4176M), 
>>> 0,7500609 secs]
>>> #11: [GC pause  (G1 Evacuation Pause) (young) 760M->756M(4946M), 
>>> 1,5464884 secs]
>>> #7: [GC concurrent-mark-end, 3,9464599 secs]
>>> #7: [GC remark, 0,0240462 secs]
>>> #7: [GC cleanup 801M->800M(4946M), 0,0108146 secs]
>>> #7: [GC concurrent-cleanup-start]
>>> #7: [GC concurrent-cleanup-end, 0,0000228 secs]
>>> #12: [GC pause  (G1 Evacuation Pause) (young) 977M->978M(5562M), 
>>> 2,2464423 secs]
>>>
>>> Notice how the complete concurrent cycle is decorated with id #7. 
>>> That makes it
>>> possible to follow the concurrent work more easily and grep it out etc.
>>>
>>>
>>> Here's a similar thing for CMS where the concurrent cycle #22 has a 
>>> young GC #23
>>> in the middle of it:
>>>
>>> #21: [GC (Allocation Failure) 1621551K->1107977K(2395632K), 
>>> 1,4558792 secs]
>>> #22: [GC (CMS Initial Mark)  1336515K(2452584K), 0,1107938 secs]
>>> #23: [GC (Allocation Failure) 1768969K->1184500K(2452584K), 
>>> 0,7909961 secs]
>>> #22: [GC (CMS Final Remark)  1304871K(2452584K), 0,2353840 secs]
>>>
>>>
>>> I think this is very helpful information to have, so I would like to 
>>> push this
>>> with the new flag PrintGCID as true by default. I'm sure some 
>>> parsers will
>>> break, but it is an easy work around to run with -XX:-PrintGCID to 
>>> get the old
>>> format. It should even be pretty easy to do post processing on a log 
>>> file to
>>> remove the GC id information. Having it on by default makes it 
>>> possible for us
>>> to use this information and in particular be able to map it to the 
>>> JFR events we
>>> see in MissionControl.
>>>
>>> I would also like to backport this to 8u40, but in that case with 
>>> the default
>>> value of PrintGCID as false.
>>>
>>> Thanks,
>>> Bengt
>>>
>>>
>>>
>>>
>>>
>>>
>



More information about the hotspot-dev mailing list