RFR (M): JDK-8043607: Add a GC id as a log decoration similar to PrintGCTimeStamps
Bengt Rutisson
bengt.rutisson at oracle.com
Thu May 22 07:40:48 UTC 2014
Hi Jesper,
On 5/21/14 7:44 PM, Jesper Wilhelmsson wrote:
> Hi Bengt,
>
> Good change! Thanks for doing this!
Thanks for looking at it.
>
> A few thoughts:
>
> * There are a few places that passes both a GCTimer and now a GCId.
> Would it make sense to pass an info-object of some kind that contains
> misc info that is used for logging/tracing? I'm not suggesting that
> you should implement that in this change, just a thought for the future.
Yes, I think that may be a good future cleanup to do. But would be good
as a separate change, just as you point out.
>
>
> * In genCollectdHeap.cpp you increase the gc id only if PrintGCDetails
> is enabled. I assume this is the only place where a gc id is used when
> running collectors that use GenCollectedHeap::do_collection(). If not,
> the id created and used in other places would be different depending
> on if PrintGCDetails was enabled or not.
Yes, this is a work around for the strange way the logging for
GenCollectedHeap works. We do completely different timing and logging
for PrintGC and PrintGCDetails. That the logging is different is just
strange. That the the timing is different is in my opinion a bug. The
pause time reported is shorter if you run with PrintGC than if you run
with PrintGCDetails.
This mostly affect CMS. The logging for CMS has several other issues as
well. I added one comment about that in the bug report. So, I think it
would be good to give the CMS logging an overhaul, but that should
definitely be a separate change :) And maybe we should hold off with
that until the unified logging framework is in place.
>
> Anyway, looks good!
Thanks!
Bengt
> Ship it!
> /Jesper
>
>
> Bengt Rutisson skrev 21/5/14 11:31:
>>
>> Hi everyone,
>>
>> Can I have some reviews for this patch:
>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00/
>>
>> https://bugs.openjdk.java.net/browse/JDK-8043607
>>
>> This is mostly a GC change but I am sending it to the broader list
>> since I added
>> a method to ostream.hpp/cpp. We had this code duplicated in many
>> places in the
>> GC code:
>>
>> gclog_or_tty->date_stamp(PrintGCDateStamps);
>> gclog_or_tty->stamp(PrintGCTimeStamps);
>>
>> And with my patch I had to add one more section to this code
>> duplication.
>> Instead I wrapped all three calls in a method called gclog_stamp().
>> It is a bit
>> hard to know where to place that method. ostream seems like the wrong
>> place for
>> GC specific code, but there is already lots of GC specific code there
>> and even
>> gclog_or_tty is declared there. So I added the method there for now:
>>
>> void outputStream::gclog_stamp(uint gc_id) {
>> date_stamp(PrintGCDateStamps);
>> stamp(PrintGCTimeStamps);
>> if (PrintGCID) {
>> print("#%u: ", gc_id);
>> }
>> }
>>
>> Other than that this is a pure GC change.
>>
>>
>> Background:
>>
>> With event based tracing we introduced a GC id field on all GC
>> events. This id
>> is a constantly increasing number associated with each GC. I figured
>> we can use
>> the this GC id as a decoration for the HotSpot GC logging similarly
>> to what we
>> do with PrintGCTimeStamps and PrintGCDateStamps.
>>
>> A nice side effect of this is that it will make it easy to map an
>> event in the
>> event tracing to an entry in the GC log. Currently this is impossible
>> to do
>> deterministically since the logging and event tracing use different
>> timing.
>>
>> I propose to add a flag called PrintGCID. Here's an example log entry
>> for GC
>> number 4 running with -XX:+PrintGCID:
>>
>> #4: [GC pause (G1 Evacuation Pause) (young) 185M->212M(502M),
>> 0,2373420 secs]
>>
>> It is a decoration similar to the timestamps. Here's what it looks
>> like if you
>> add -XX:+PrintGCTimeStamps:
>>
>> 3,921: #4: [GC pause (G1 Evacuation Pause) (young) 185M->212M(502M),
>> 0,2373420
>> secs]
>>
>> And here's what it will look like if you add -XX:+PrintGCDateStamps:
>>
>> 2014-05-09T13:03:59.384+0200: 3,921: #4: [GC pause (G1 Evacuation
>> Pause)
>> (young) 185M->212M(502M), 0,2373420 secs]
>>
>>
>> Here's a little longer example running ParallelGC:
>>
>> #0: [GC (Allocation Failure) 129024K->2980K(493056K), 0,1050244 secs]
>> #1: [GC (Allocation Failure) 132004K->3731K(622080K), 0,0389474 secs]
>> #2: [GC (Allocation Failure) 261779K->27153K(622080K), 0,2057294 secs]
>> #3: [GC (Allocation Failure) 285201K->224350K(880128K), 0,2410144 secs]
>> #4: [Full GC (Ergonomics) 224350K->219198K(1154560K), 2,7104481 secs]
>> #5: [GC (Allocation Failure) 735294K->638073K(863744K), 0,7707819 secs]
>>
>> Note that for the concurrent GCs this gives a good way of seeing the
>> concurrent
>> action. Here is a G1 example:
>>
>> #0: [GC pause (G1 Evacuation Pause) (young) 24M->2881K(502M),
>> 0,0383157 secs]
>> #1: [GC pause (G1 Evacuation Pause) (young) 48M->4745K(502M),
>> 0,0412220 secs]
>> #2: [GC pause (G1 Evacuation Pause) (young) 136M->9618K(502M),
>> 0,0426144 secs]
>> #3: [GC pause (G1 Evacuation Pause) (young) 248M->143M(502M),
>> 0,1467633 secs]
>> #4: [GC pause (G1 Evacuation Pause) (young) 210M->199M(502M),
>> 0,1627465 secs]
>> #5: [GC pause (G1 Evacuation Pause) (young) 213M->235M(502M),
>> 0,2376728 secs]
>> #6: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
>> 255M->269M(1004M),
>> 0,2623779 secs]
>> #7: [GC concurrent-root-region-scan-start]
>> #7: [GC concurrent-root-region-scan-end, 0,0158488 secs]
>> #7: [GC concurrent-mark-start]
>> #8: [GC pause (G1 Evacuation Pause) (young) 315M->330M(2008M),
>> 0,3007405 secs]
>> #9: [GC pause (G1 Evacuation Pause) (young) 422M->418M(3212M),
>> 0,4063937 secs]
>> #10: [GC pause (G1 Evacuation Pause) (young) 564M->572M(4176M),
>> 0,7500609 secs]
>> #11: [GC pause (G1 Evacuation Pause) (young) 760M->756M(4946M),
>> 1,5464884 secs]
>> #7: [GC concurrent-mark-end, 3,9464599 secs]
>> #7: [GC remark, 0,0240462 secs]
>> #7: [GC cleanup 801M->800M(4946M), 0,0108146 secs]
>> #7: [GC concurrent-cleanup-start]
>> #7: [GC concurrent-cleanup-end, 0,0000228 secs]
>> #12: [GC pause (G1 Evacuation Pause) (young) 977M->978M(5562M),
>> 2,2464423 secs]
>>
>> Notice how the complete concurrent cycle is decorated with id #7.
>> That makes it
>> possible to follow the concurrent work more easily and grep it out etc.
>>
>>
>> Here's a similar thing for CMS where the concurrent cycle #22 has a
>> young GC #23
>> in the middle of it:
>>
>> #21: [GC (Allocation Failure) 1621551K->1107977K(2395632K),
>> 1,4558792 secs]
>> #22: [GC (CMS Initial Mark) 1336515K(2452584K), 0,1107938 secs]
>> #23: [GC (Allocation Failure) 1768969K->1184500K(2452584K),
>> 0,7909961 secs]
>> #22: [GC (CMS Final Remark) 1304871K(2452584K), 0,2353840 secs]
>>
>>
>> I think this is very helpful information to have, so I would like to
>> push this
>> with the new flag PrintGCID as true by default. I'm sure some parsers
>> will
>> break, but it is an easy work around to run with -XX:-PrintGCID to
>> get the old
>> format. It should even be pretty easy to do post processing on a log
>> file to
>> remove the GC id information. Having it on by default makes it
>> possible for us
>> to use this information and in particular be able to map it to the
>> JFR events we
>> see in MissionControl.
>>
>> I would also like to backport this to 8u40, but in that case with the
>> default
>> value of PrintGCID as false.
>>
>> Thanks,
>> Bengt
>>
>>
>>
>>
>>
>>
More information about the hotspot-dev
mailing list