RFR (M): JDK-8043607: Add a GC id as a log decoration similar to PrintGCTimeStamps

Erik Helin erik.helin at oracle.com
Thu Jun 5 09:43:39 UTC 2014


Hi Bengt,

On Wednesday 04 June 2014 17.06.06 Bengt Rutisson wrote:
> Hi Erik,
> 
> On 5/28/14 2:29 PM, Erik Helin wrote:
> > Hi Bengt,
> > 
> > just a first quick general comment: would it make sense for GCId to be
> > in its own file or class, so the GC logging don't have to depend on
> > the GC trace framework?
> 
> Good point.
> 
> Here is an updated webrev where I separeted out GCId to its own file:
> http://cr.openjdk.java.net/~brutisso/8043607/webrev.02/
> 
> Here's just the diff compared to the earlier version:
> http://cr.openjdk.java.net/~brutisso/8043607/webrev.01-02.diff/

Looks much better! A couple of suggestions:

- How about shortening the names a little bit for the static methods?
  - GCId::create()
  - GCId::peek()

  Since the methods are static, we must always prefix the methods with
  GCId, so this will be as explicit as the current version.

- Instead of using the global static variable UNSET_GCID, what about
  having a third static method, undefined():
  - public static GCId GCId::undefined()

  We could also implement operator== and operator!= so that comparing
  two GCIds becomes a little easier.

What do you think?

Thanks,
Erik

> Jesper, the updated webrev also contains the spelling mistake you
> noticed. Thanks for catching that!
> 
> Bengt
> 
> > Thanks,
> > Erik
> > 
> > On Friday 23 May 2014 07.00.44 Bengt Rutisson wrote:
> >> Hi Jesper and everyone else,
> >> 
> >> After explaining the problem with GenCollectedHeap in my last email I
> >> realized that a different approach would be to just assume that the next
> >> GC id will be the right one when we start the logging before we have
> >> incremented the GC id. I think that is a nicer way of doing it than
> >> "wasting" a GC id as I did before.
> >> 
> >> Here's an updated webrev with that changed:
> >> http://cr.openjdk.java.net/~brutisso/8043607/webrev.01/
> >> 
> >> Here's the diff compared to the last webrev:
> >> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00-01-diff/
> >> 
> >> Thanks,
> >> Bengt
> >> 
> >> On 5/22/14 9:40 AM, Bengt Rutisson wrote:
> >>> Hi Jesper,
> >>> 
> >>> On 5/21/14 7:44 PM, Jesper Wilhelmsson wrote:
> >>>> Hi Bengt,
> >>>> 
> >>>> Good change! Thanks for doing this!
> >>> 
> >>> Thanks for looking at it.
> >>> 
> >>>> A few thoughts:
> >>>> 
> >>>> * There are a few places that passes both a GCTimer and now a GCId.
> >>>> Would it make sense to pass an info-object of some kind that contains
> >>>> misc info that is used for logging/tracing? I'm not suggesting that
> >>>> you should implement that in this change, just a thought for the
> > 
> > future.
> > 
> >>> Yes, I think that may be a good future cleanup to do. But would be
> >>> good as a separate change, just as you point out.
> >>> 
> >>>> * In genCollectdHeap.cpp you increase the gc id only if
> >>>> PrintGCDetails is enabled. I assume this is the only place where a gc
> >>>> id is used when running collectors that use
> >>>> GenCollectedHeap::do_collection(). If not, the id created and used in
> >>>> other places would be different depending on if PrintGCDetails was
> >>>> enabled or not.
> >>> 
> >>> Yes, this is a work around for the strange way the logging for
> >>> GenCollectedHeap works. We do completely different timing and logging
> >>> for PrintGC and PrintGCDetails. That the logging is different is just
> >>> strange. That the the timing is different is in my opinion a bug. The
> >>> pause time reported is shorter if you run with PrintGC than if you run
> >>> with PrintGCDetails.
> >>> 
> >>> This mostly affect CMS. The logging for CMS has several other issues
> >>> as well. I added one comment about that in the bug report. So, I think
> >>> it would be good to give the CMS logging an overhaul, but that should
> >>> definitely be a separate change :) And maybe we should hold off with
> >>> that until the unified logging framework is in place.
> >>> 
> >>>> Anyway, looks good!
> >>> 
> >>> Thanks!
> >>> 
> >>> Bengt
> >>> 
> >>>> Ship it!
> >>>> /Jesper
> >>>> 
> >>>> Bengt Rutisson skrev 21/5/14 11:31:
> >>>>> Hi everyone,
> >>>>> 
> >>>>> Can I have some reviews for this patch:
> >>>>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00/
> >>>>> 
> >>>>> https://bugs.openjdk.java.net/browse/JDK-8043607
> >>>>> 
> >>>>> This is mostly a GC change but I am sending it to the broader list
> >>>>> since I added
> >>>>> a method to ostream.hpp/cpp. We had this code duplicated in many
> >>>>> places in the
> >>>>> GC code:
> >>>>> 
> >>>>> gclog_or_tty->date_stamp(PrintGCDateStamps);
> >>>>> gclog_or_tty->stamp(PrintGCTimeStamps);
> >>>>> 
> >>>>> And with my patch I had to add one more section to this code
> >>>>> duplication.
> >>>>> Instead I wrapped all three calls in a method called gclog_stamp().
> >>>>> It is a bit
> >>>>> hard to know where to place that method. ostream seems like the
> >>>>> wrong place for
> >>>>> GC specific code, but there is already lots of GC specific code
> >>>>> there and even
> >>>>> gclog_or_tty is declared there. So I added the method there for now:
> >>>>> 
> >>>>> void outputStream::gclog_stamp(uint gc_id) {
> >>>>> 
> >>>>>     date_stamp(PrintGCDateStamps);
> >>>>>     stamp(PrintGCTimeStamps);
> >>>>>     if (PrintGCID) {
> >>>>>     
> >>>>>       print("#%u: ", gc_id);
> >>>>>     
> >>>>>     }
> >>>>> 
> >>>>> }
> >>>>> 
> >>>>> Other than that this is a pure GC change.
> >>>>> 
> >>>>> 
> >>>>> Background:
> >>>>> 
> >>>>> With event based tracing we introduced a GC id field on all GC
> >>>>> events. This id
> >>>>> is a constantly increasing number associated with each GC. I figured
> >>>>> we can use
> >>>>> the this GC id as a decoration for the HotSpot GC logging similarly
> >>>>> to what we
> >>>>> do with PrintGCTimeStamps and PrintGCDateStamps.
> >>>>> 
> >>>>> A nice side effect of this is that it will make it easy to map an
> >>>>> event in the
> >>>>> event tracing to an entry in the GC log. Currently this is
> >>>>> impossible to do
> >>>>> deterministically since the logging and event tracing use different
> >>>>> timing.
> >>>>> 
> >>>>> I propose to add a flag called PrintGCID. Here's an example log
> >>>>> entry for GC
> >>>>> number 4 running with -XX:+PrintGCID:
> >>>>> 
> >>>>> #4: [GC pause  (G1 Evacuation Pause) (young) 185M->212M(502M),
> >>>>> 0,2373420 secs]
> >>>>> 
> >>>>> It is a decoration similar to the timestamps. Here's what it looks
> >>>>> like if you
> >>>>> add -XX:+PrintGCTimeStamps:
> >>>>> 
> >>>>> 3,921: #4: [GC pause  (G1 Evacuation Pause) (young)
> >>>>> 185M->212M(502M), 0,2373420
> >>>>> secs]
> >>>>> 
> >>>>> And here's what it will look like if you add -XX:+PrintGCDateStamps:
> >>>>> 
> >>>>> 2014-05-09T13:03:59.384+0200: 3,921: #4: [GC pause  (G1 Evacuation
> >>>>> Pause)
> >>>>> (young) 185M->212M(502M), 0,2373420 secs]
> >>>>> 
> >>>>> 
> >>>>> Here's a little longer example running ParallelGC:
> >>>>> 
> >>>>> #0: [GC (Allocation Failure)  129024K->2980K(493056K), 0,1050244
> > 
> > secs]
> > 
> >>>>> #1: [GC (Allocation Failure)  132004K->3731K(622080K), 0,0389474
> > 
> > secs]
> > 
> >>>>> #2: [GC (Allocation Failure)  261779K->27153K(622080K), 0,2057294
> > 
> > secs]
> > 
> >>>>> #3: [GC (Allocation Failure)  285201K->224350K(880128K), 0,2410144
> >>>>> secs]
> >>>>> #4: [Full GC (Ergonomics)  224350K->219198K(1154560K), 2,7104481
> > 
> > secs]
> > 
> >>>>> #5: [GC (Allocation Failure)  735294K->638073K(863744K), 0,7707819
> >>>>> secs]
> >>>>> 
> >>>>> Note that for the concurrent GCs this gives a good way of seeing the
> >>>>> concurrent
> >>>>> action. Here is a G1 example:
> >>>>> 
> >>>>> #0: [GC pause  (G1 Evacuation Pause) (young) 24M->2881K(502M),
> >>>>> 0,0383157 secs]
> >>>>> #1: [GC pause  (G1 Evacuation Pause) (young) 48M->4745K(502M),
> >>>>> 0,0412220 secs]
> >>>>> #2: [GC pause  (G1 Evacuation Pause) (young) 136M->9618K(502M),
> >>>>> 0,0426144 secs]
> >>>>> #3: [GC pause  (G1 Evacuation Pause) (young) 248M->143M(502M),
> >>>>> 0,1467633 secs]
> >>>>> #4: [GC pause  (G1 Evacuation Pause) (young) 210M->199M(502M),
> >>>>> 0,1627465 secs]
> >>>>> #5: [GC pause  (G1 Evacuation Pause) (young) 213M->235M(502M),
> >>>>> 0,2376728 secs]
> >>>>> #6: [GC pause  (G1 Evacuation Pause) (young) (initial-mark)
> >>>>> 255M->269M(1004M),
> >>>>> 0,2623779 secs]
> >>>>> #7: [GC concurrent-root-region-scan-start]
> >>>>> #7: [GC concurrent-root-region-scan-end, 0,0158488 secs]
> >>>>> #7: [GC concurrent-mark-start]
> >>>>> #8: [GC pause  (G1 Evacuation Pause) (young) 315M->330M(2008M),
> >>>>> 0,3007405 secs]
> >>>>> #9: [GC pause  (G1 Evacuation Pause) (young) 422M->418M(3212M),
> >>>>> 0,4063937 secs]
> >>>>> #10: [GC pause  (G1 Evacuation Pause) (young) 564M->572M(4176M),
> >>>>> 0,7500609 secs]
> >>>>> #11: [GC pause  (G1 Evacuation Pause) (young) 760M->756M(4946M),
> >>>>> 1,5464884 secs]
> >>>>> #7: [GC concurrent-mark-end, 3,9464599 secs]
> >>>>> #7: [GC remark, 0,0240462 secs]
> >>>>> #7: [GC cleanup 801M->800M(4946M), 0,0108146 secs]
> >>>>> #7: [GC concurrent-cleanup-start]
> >>>>> #7: [GC concurrent-cleanup-end, 0,0000228 secs]
> >>>>> #12: [GC pause  (G1 Evacuation Pause) (young) 977M->978M(5562M),
> >>>>> 2,2464423 secs]
> >>>>> 
> >>>>> Notice how the complete concurrent cycle is decorated with id #7.
> >>>>> That makes it
> >>>>> possible to follow the concurrent work more easily and grep it out
> > 
> > etc.
> > 
> >>>>> Here's a similar thing for CMS where the concurrent cycle #22 has a
> >>>>> young GC #23
> >>>>> in the middle of it:
> >>>>> 
> >>>>> #21: [GC (Allocation Failure) 1621551K->1107977K(2395632K),
> >>>>> 1,4558792 secs]
> >>>>> #22: [GC (CMS Initial Mark)  1336515K(2452584K), 0,1107938 secs]
> >>>>> #23: [GC (Allocation Failure) 1768969K->1184500K(2452584K),
> >>>>> 0,7909961 secs]
> >>>>> #22: [GC (CMS Final Remark)  1304871K(2452584K), 0,2353840 secs]
> >>>>> 
> >>>>> 
> >>>>> I think this is very helpful information to have, so I would like to
> >>>>> push this
> >>>>> with the new flag PrintGCID as true by default. I'm sure some
> >>>>> parsers will
> >>>>> break, but it is an easy work around to run with -XX:-PrintGCID to
> >>>>> get the old
> >>>>> format. It should even be pretty easy to do post processing on a log
> >>>>> file to
> >>>>> remove the GC id information. Having it on by default makes it
> >>>>> possible for us
> >>>>> to use this information and in particular be able to map it to the
> >>>>> JFR events we
> >>>>> see in MissionControl.
> >>>>> 
> >>>>> I would also like to backport this to 8u40, but in that case with
> >>>>> the default
> >>>>> value of PrintGCID as false.
> >>>>> 
> >>>>> Thanks,
> >>>>> Bengt



More information about the hotspot-dev mailing list