RFR (M): JDK-8043607: Add a GC id as a log decoration similar to PrintGCTimeStamps
Erik Helin
erik.helin at oracle.com
Wed May 28 12:29:31 UTC 2014
Hi Bengt,
just a first quick general comment: would it make sense for GCId to be
in its own file or class, so the GC logging don't have to depend on
the GC trace framework?
Thanks,
Erik
On Friday 23 May 2014 07.00.44 Bengt Rutisson wrote:
> Hi Jesper and everyone else,
>
> After explaining the problem with GenCollectedHeap in my last email I
> realized that a different approach would be to just assume that the next
> GC id will be the right one when we start the logging before we have
> incremented the GC id. I think that is a nicer way of doing it than
> "wasting" a GC id as I did before.
>
> Here's an updated webrev with that changed:
> http://cr.openjdk.java.net/~brutisso/8043607/webrev.01/
>
> Here's the diff compared to the last webrev:
> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00-01-diff/
>
> Thanks,
> Bengt
>
> On 5/22/14 9:40 AM, Bengt Rutisson wrote:
> > Hi Jesper,
> >
> > On 5/21/14 7:44 PM, Jesper Wilhelmsson wrote:
> >> Hi Bengt,
> >>
> >> Good change! Thanks for doing this!
> >
> > Thanks for looking at it.
> >
> >> A few thoughts:
> >>
> >> * There are a few places that passes both a GCTimer and now a GCId.
> >> Would it make sense to pass an info-object of some kind that contains
> >> misc info that is used for logging/tracing? I'm not suggesting that
> >> you should implement that in this change, just a thought for the
future.
> >
> > Yes, I think that may be a good future cleanup to do. But would be
> > good as a separate change, just as you point out.
> >
> >> * In genCollectdHeap.cpp you increase the gc id only if
> >> PrintGCDetails is enabled. I assume this is the only place where a gc
> >> id is used when running collectors that use
> >> GenCollectedHeap::do_collection(). If not, the id created and used in
> >> other places would be different depending on if PrintGCDetails was
> >> enabled or not.
> >
> > Yes, this is a work around for the strange way the logging for
> > GenCollectedHeap works. We do completely different timing and logging
> > for PrintGC and PrintGCDetails. That the logging is different is just
> > strange. That the the timing is different is in my opinion a bug. The
> > pause time reported is shorter if you run with PrintGC than if you run
> > with PrintGCDetails.
> >
> > This mostly affect CMS. The logging for CMS has several other issues
> > as well. I added one comment about that in the bug report. So, I think
> > it would be good to give the CMS logging an overhaul, but that should
> > definitely be a separate change :) And maybe we should hold off with
> > that until the unified logging framework is in place.
> >
> >> Anyway, looks good!
> >
> > Thanks!
> >
> > Bengt
> >
> >> Ship it!
> >> /Jesper
> >>
> >> Bengt Rutisson skrev 21/5/14 11:31:
> >>> Hi everyone,
> >>>
> >>> Can I have some reviews for this patch:
> >>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00/
> >>>
> >>> https://bugs.openjdk.java.net/browse/JDK-8043607
> >>>
> >>> This is mostly a GC change but I am sending it to the broader list
> >>> since I added
> >>> a method to ostream.hpp/cpp. We had this code duplicated in many
> >>> places in the
> >>> GC code:
> >>>
> >>> gclog_or_tty->date_stamp(PrintGCDateStamps);
> >>> gclog_or_tty->stamp(PrintGCTimeStamps);
> >>>
> >>> And with my patch I had to add one more section to this code
> >>> duplication.
> >>> Instead I wrapped all three calls in a method called gclog_stamp().
> >>> It is a bit
> >>> hard to know where to place that method. ostream seems like the
> >>> wrong place for
> >>> GC specific code, but there is already lots of GC specific code
> >>> there and even
> >>> gclog_or_tty is declared there. So I added the method there for now:
> >>>
> >>> void outputStream::gclog_stamp(uint gc_id) {
> >>>
> >>> date_stamp(PrintGCDateStamps);
> >>> stamp(PrintGCTimeStamps);
> >>> if (PrintGCID) {
> >>>
> >>> print("#%u: ", gc_id);
> >>>
> >>> }
> >>>
> >>> }
> >>>
> >>> Other than that this is a pure GC change.
> >>>
> >>>
> >>> Background:
> >>>
> >>> With event based tracing we introduced a GC id field on all GC
> >>> events. This id
> >>> is a constantly increasing number associated with each GC. I figured
> >>> we can use
> >>> the this GC id as a decoration for the HotSpot GC logging similarly
> >>> to what we
> >>> do with PrintGCTimeStamps and PrintGCDateStamps.
> >>>
> >>> A nice side effect of this is that it will make it easy to map an
> >>> event in the
> >>> event tracing to an entry in the GC log. Currently this is
> >>> impossible to do
> >>> deterministically since the logging and event tracing use different
> >>> timing.
> >>>
> >>> I propose to add a flag called PrintGCID. Here's an example log
> >>> entry for GC
> >>> number 4 running with -XX:+PrintGCID:
> >>>
> >>> #4: [GC pause (G1 Evacuation Pause) (young) 185M->212M(502M),
> >>> 0,2373420 secs]
> >>>
> >>> It is a decoration similar to the timestamps. Here's what it looks
> >>> like if you
> >>> add -XX:+PrintGCTimeStamps:
> >>>
> >>> 3,921: #4: [GC pause (G1 Evacuation Pause) (young)
> >>> 185M->212M(502M), 0,2373420
> >>> secs]
> >>>
> >>> And here's what it will look like if you add -XX:+PrintGCDateStamps:
> >>>
> >>> 2014-05-09T13:03:59.384+0200: 3,921: #4: [GC pause (G1 Evacuation
> >>> Pause)
> >>> (young) 185M->212M(502M), 0,2373420 secs]
> >>>
> >>>
> >>> Here's a little longer example running ParallelGC:
> >>>
> >>> #0: [GC (Allocation Failure) 129024K->2980K(493056K), 0,1050244
secs]
> >>> #1: [GC (Allocation Failure) 132004K->3731K(622080K), 0,0389474
secs]
> >>> #2: [GC (Allocation Failure) 261779K->27153K(622080K), 0,2057294
secs]
> >>> #3: [GC (Allocation Failure) 285201K->224350K(880128K), 0,2410144
> >>> secs]
> >>> #4: [Full GC (Ergonomics) 224350K->219198K(1154560K), 2,7104481
secs]
> >>> #5: [GC (Allocation Failure) 735294K->638073K(863744K), 0,7707819
> >>> secs]
> >>>
> >>> Note that for the concurrent GCs this gives a good way of seeing the
> >>> concurrent
> >>> action. Here is a G1 example:
> >>>
> >>> #0: [GC pause (G1 Evacuation Pause) (young) 24M->2881K(502M),
> >>> 0,0383157 secs]
> >>> #1: [GC pause (G1 Evacuation Pause) (young) 48M->4745K(502M),
> >>> 0,0412220 secs]
> >>> #2: [GC pause (G1 Evacuation Pause) (young) 136M->9618K(502M),
> >>> 0,0426144 secs]
> >>> #3: [GC pause (G1 Evacuation Pause) (young) 248M->143M(502M),
> >>> 0,1467633 secs]
> >>> #4: [GC pause (G1 Evacuation Pause) (young) 210M->199M(502M),
> >>> 0,1627465 secs]
> >>> #5: [GC pause (G1 Evacuation Pause) (young) 213M->235M(502M),
> >>> 0,2376728 secs]
> >>> #6: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> >>> 255M->269M(1004M),
> >>> 0,2623779 secs]
> >>> #7: [GC concurrent-root-region-scan-start]
> >>> #7: [GC concurrent-root-region-scan-end, 0,0158488 secs]
> >>> #7: [GC concurrent-mark-start]
> >>> #8: [GC pause (G1 Evacuation Pause) (young) 315M->330M(2008M),
> >>> 0,3007405 secs]
> >>> #9: [GC pause (G1 Evacuation Pause) (young) 422M->418M(3212M),
> >>> 0,4063937 secs]
> >>> #10: [GC pause (G1 Evacuation Pause) (young) 564M->572M(4176M),
> >>> 0,7500609 secs]
> >>> #11: [GC pause (G1 Evacuation Pause) (young) 760M->756M(4946M),
> >>> 1,5464884 secs]
> >>> #7: [GC concurrent-mark-end, 3,9464599 secs]
> >>> #7: [GC remark, 0,0240462 secs]
> >>> #7: [GC cleanup 801M->800M(4946M), 0,0108146 secs]
> >>> #7: [GC concurrent-cleanup-start]
> >>> #7: [GC concurrent-cleanup-end, 0,0000228 secs]
> >>> #12: [GC pause (G1 Evacuation Pause) (young) 977M->978M(5562M),
> >>> 2,2464423 secs]
> >>>
> >>> Notice how the complete concurrent cycle is decorated with id #7.
> >>> That makes it
> >>> possible to follow the concurrent work more easily and grep it out
etc.
> >>>
> >>>
> >>> Here's a similar thing for CMS where the concurrent cycle #22 has a
> >>> young GC #23
> >>> in the middle of it:
> >>>
> >>> #21: [GC (Allocation Failure) 1621551K->1107977K(2395632K),
> >>> 1,4558792 secs]
> >>> #22: [GC (CMS Initial Mark) 1336515K(2452584K), 0,1107938 secs]
> >>> #23: [GC (Allocation Failure) 1768969K->1184500K(2452584K),
> >>> 0,7909961 secs]
> >>> #22: [GC (CMS Final Remark) 1304871K(2452584K), 0,2353840 secs]
> >>>
> >>>
> >>> I think this is very helpful information to have, so I would like to
> >>> push this
> >>> with the new flag PrintGCID as true by default. I'm sure some
> >>> parsers will
> >>> break, but it is an easy work around to run with -XX:-PrintGCID to
> >>> get the old
> >>> format. It should even be pretty easy to do post processing on a log
> >>> file to
> >>> remove the GC id information. Having it on by default makes it
> >>> possible for us
> >>> to use this information and in particular be able to map it to the
> >>> JFR events we
> >>> see in MissionControl.
> >>>
> >>> I would also like to backport this to 8u40, but in that case with
> >>> the default
> >>> value of PrintGCID as false.
> >>>
> >>> Thanks,
> >>> Bengt
More information about the hotspot-dev
mailing list