RFR (M): JDK-8043607: Add a GC id as a log decoration similar to PrintGCTimeStamps
Kirk Pepperdine
kirk at kodewerk.com
Wed May 21 14:12:26 UTC 2014
Hi Bengt,
This change looks very useful.
Kind regards,
Kirk
On May 21, 2014, at 11:31 AM, Bengt Rutisson <bengt.rutisson at oracle.com> wrote:
>
> Hi everyone,
>
> Can I have some reviews for this patch:
> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00/
>
> https://bugs.openjdk.java.net/browse/JDK-8043607
>
> This is mostly a GC change but I am sending it to the broader list since I added a method to ostream.hpp/cpp. We had this code duplicated in many places in the GC code:
>
> gclog_or_tty->date_stamp(PrintGCDateStamps);
> gclog_or_tty->stamp(PrintGCTimeStamps);
>
> And with my patch I had to add one more section to this code duplication. Instead I wrapped all three calls in a method called gclog_stamp(). It is a bit hard to know where to place that method. ostream seems like the wrong place for GC specific code, but there is already lots of GC specific code there and even gclog_or_tty is declared there. So I added the method there for now:
>
> void outputStream::gclog_stamp(uint gc_id) {
> date_stamp(PrintGCDateStamps);
> stamp(PrintGCTimeStamps);
> if (PrintGCID) {
> print("#%u: ", gc_id);
> }
> }
>
> Other than that this is a pure GC change.
>
>
> Background:
>
> With event based tracing we introduced a GC id field on all GC events. This id is a constantly increasing number associated with each GC. I figured we can use the this GC id as a decoration for the HotSpot GC logging similarly to what we do with PrintGCTimeStamps and PrintGCDateStamps.
>
> A nice side effect of this is that it will make it easy to map an event in the event tracing to an entry in the GC log. Currently this is impossible to do deterministically since the logging and event tracing use different timing.
>
> I propose to add a flag called PrintGCID. Here's an example log entry for GC number 4 running with -XX:+PrintGCID:
>
> #4: [GC pause (G1 Evacuation Pause) (young) 185M->212M(502M), 0,2373420 secs]
>
> It is a decoration similar to the timestamps. Here's what it looks like if you add -XX:+PrintGCTimeStamps:
>
> 3,921: #4: [GC pause (G1 Evacuation Pause) (young) 185M->212M(502M), 0,2373420 secs]
>
> And here's what it will look like if you add -XX:+PrintGCDateStamps:
>
> 2014-05-09T13:03:59.384+0200: 3,921: #4: [GC pause (G1 Evacuation Pause) (young) 185M->212M(502M), 0,2373420 secs]
>
>
> Here's a little longer example running ParallelGC:
>
> #0: [GC (Allocation Failure) 129024K->2980K(493056K), 0,1050244 secs]
> #1: [GC (Allocation Failure) 132004K->3731K(622080K), 0,0389474 secs]
> #2: [GC (Allocation Failure) 261779K->27153K(622080K), 0,2057294 secs]
> #3: [GC (Allocation Failure) 285201K->224350K(880128K), 0,2410144 secs]
> #4: [Full GC (Ergonomics) 224350K->219198K(1154560K), 2,7104481 secs]
> #5: [GC (Allocation Failure) 735294K->638073K(863744K), 0,7707819 secs]
>
> Note that for the concurrent GCs this gives a good way of seeing the concurrent action. Here is a G1 example:
>
> #0: [GC pause (G1 Evacuation Pause) (young) 24M->2881K(502M), 0,0383157 secs]
> #1: [GC pause (G1 Evacuation Pause) (young) 48M->4745K(502M), 0,0412220 secs]
> #2: [GC pause (G1 Evacuation Pause) (young) 136M->9618K(502M), 0,0426144 secs]
> #3: [GC pause (G1 Evacuation Pause) (young) 248M->143M(502M), 0,1467633 secs]
> #4: [GC pause (G1 Evacuation Pause) (young) 210M->199M(502M), 0,1627465 secs]
> #5: [GC pause (G1 Evacuation Pause) (young) 213M->235M(502M), 0,2376728 secs]
> #6: [GC pause (G1 Evacuation Pause) (young) (initial-mark) 255M->269M(1004M), 0,2623779 secs]
> #7: [GC concurrent-root-region-scan-start]
> #7: [GC concurrent-root-region-scan-end, 0,0158488 secs]
> #7: [GC concurrent-mark-start]
> #8: [GC pause (G1 Evacuation Pause) (young) 315M->330M(2008M), 0,3007405 secs]
> #9: [GC pause (G1 Evacuation Pause) (young) 422M->418M(3212M), 0,4063937 secs]
> #10: [GC pause (G1 Evacuation Pause) (young) 564M->572M(4176M), 0,7500609 secs]
> #11: [GC pause (G1 Evacuation Pause) (young) 760M->756M(4946M), 1,5464884 secs]
> #7: [GC concurrent-mark-end, 3,9464599 secs]
> #7: [GC remark, 0,0240462 secs]
> #7: [GC cleanup 801M->800M(4946M), 0,0108146 secs]
> #7: [GC concurrent-cleanup-start]
> #7: [GC concurrent-cleanup-end, 0,0000228 secs]
> #12: [GC pause (G1 Evacuation Pause) (young) 977M->978M(5562M), 2,2464423 secs]
>
> Notice how the complete concurrent cycle is decorated with id #7. That makes it possible to follow the concurrent work more easily and grep it out etc.
>
>
> Here's a similar thing for CMS where the concurrent cycle #22 has a young GC #23 in the middle of it:
>
> #21: [GC (Allocation Failure) 1621551K->1107977K(2395632K), 1,4558792 secs]
> #22: [GC (CMS Initial Mark) 1336515K(2452584K), 0,1107938 secs]
> #23: [GC (Allocation Failure) 1768969K->1184500K(2452584K), 0,7909961 secs]
> #22: [GC (CMS Final Remark) 1304871K(2452584K), 0,2353840 secs]
>
>
> I think this is very helpful information to have, so I would like to push this with the new flag PrintGCID as true by default. I'm sure some parsers will break, but it is an easy work around to run with -XX:-PrintGCID to get the old format. It should even be pretty easy to do post processing on a log file to remove the GC id information. Having it on by default makes it possible for us to use this information and in particular be able to map it to the JFR events we see in MissionControl.
>
> I would also like to backport this to 8u40, but in that case with the default value of PrintGCID as false.
>
> Thanks,
> Bengt
>
>
>
>
>
>
More information about the hotspot-dev
mailing list