RFR (M): JDK-8043607: Add a GC id as a log decoration similar to PrintGCTimeStamps
Bengt Rutisson
bengt.rutisson at oracle.com
Wed Jun 4 15:06:06 UTC 2014
Hi Erik,
On 5/28/14 2:29 PM, Erik Helin wrote:
> Hi Bengt,
>
> just a first quick general comment: would it make sense for GCId to be
> in its own file or class, so the GC logging don't have to depend on
> the GC trace framework?
Good point.
Here is an updated webrev where I separeted out GCId to its own file:
http://cr.openjdk.java.net/~brutisso/8043607/webrev.02/
Here's just the diff compared to the earlier version:
http://cr.openjdk.java.net/~brutisso/8043607/webrev.01-02.diff/
Jesper, the updated webrev also contains the spelling mistake you
noticed. Thanks for catching that!
Bengt
>
> Thanks,
> Erik
>
> On Friday 23 May 2014 07.00.44 Bengt Rutisson wrote:
>> Hi Jesper and everyone else,
>>
>> After explaining the problem with GenCollectedHeap in my last email I
>> realized that a different approach would be to just assume that the next
>> GC id will be the right one when we start the logging before we have
>> incremented the GC id. I think that is a nicer way of doing it than
>> "wasting" a GC id as I did before.
>>
>> Here's an updated webrev with that changed:
>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.01/
>>
>> Here's the diff compared to the last webrev:
>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00-01-diff/
>>
>> Thanks,
>> Bengt
>>
>> On 5/22/14 9:40 AM, Bengt Rutisson wrote:
>>> Hi Jesper,
>>>
>>> On 5/21/14 7:44 PM, Jesper Wilhelmsson wrote:
>>>> Hi Bengt,
>>>>
>>>> Good change! Thanks for doing this!
>>> Thanks for looking at it.
>>>
>>>> A few thoughts:
>>>>
>>>> * There are a few places that passes both a GCTimer and now a GCId.
>>>> Would it make sense to pass an info-object of some kind that contains
>>>> misc info that is used for logging/tracing? I'm not suggesting that
>>>> you should implement that in this change, just a thought for the
> future.
>>> Yes, I think that may be a good future cleanup to do. But would be
>>> good as a separate change, just as you point out.
>>>
>>>> * In genCollectdHeap.cpp you increase the gc id only if
>>>> PrintGCDetails is enabled. I assume this is the only place where a gc
>>>> id is used when running collectors that use
>>>> GenCollectedHeap::do_collection(). If not, the id created and used in
>>>> other places would be different depending on if PrintGCDetails was
>>>> enabled or not.
>>> Yes, this is a work around for the strange way the logging for
>>> GenCollectedHeap works. We do completely different timing and logging
>>> for PrintGC and PrintGCDetails. That the logging is different is just
>>> strange. That the the timing is different is in my opinion a bug. The
>>> pause time reported is shorter if you run with PrintGC than if you run
>>> with PrintGCDetails.
>>>
>>> This mostly affect CMS. The logging for CMS has several other issues
>>> as well. I added one comment about that in the bug report. So, I think
>>> it would be good to give the CMS logging an overhaul, but that should
>>> definitely be a separate change :) And maybe we should hold off with
>>> that until the unified logging framework is in place.
>>>
>>>> Anyway, looks good!
>>> Thanks!
>>>
>>> Bengt
>>>
>>>> Ship it!
>>>> /Jesper
>>>>
>>>> Bengt Rutisson skrev 21/5/14 11:31:
>>>>> Hi everyone,
>>>>>
>>>>> Can I have some reviews for this patch:
>>>>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00/
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8043607
>>>>>
>>>>> This is mostly a GC change but I am sending it to the broader list
>>>>> since I added
>>>>> a method to ostream.hpp/cpp. We had this code duplicated in many
>>>>> places in the
>>>>> GC code:
>>>>>
>>>>> gclog_or_tty->date_stamp(PrintGCDateStamps);
>>>>> gclog_or_tty->stamp(PrintGCTimeStamps);
>>>>>
>>>>> And with my patch I had to add one more section to this code
>>>>> duplication.
>>>>> Instead I wrapped all three calls in a method called gclog_stamp().
>>>>> It is a bit
>>>>> hard to know where to place that method. ostream seems like the
>>>>> wrong place for
>>>>> GC specific code, but there is already lots of GC specific code
>>>>> there and even
>>>>> gclog_or_tty is declared there. So I added the method there for now:
>>>>>
>>>>> void outputStream::gclog_stamp(uint gc_id) {
>>>>>
>>>>> date_stamp(PrintGCDateStamps);
>>>>> stamp(PrintGCTimeStamps);
>>>>> if (PrintGCID) {
>>>>>
>>>>> print("#%u: ", gc_id);
>>>>>
>>>>> }
>>>>>
>>>>> }
>>>>>
>>>>> Other than that this is a pure GC change.
>>>>>
>>>>>
>>>>> Background:
>>>>>
>>>>> With event based tracing we introduced a GC id field on all GC
>>>>> events. This id
>>>>> is a constantly increasing number associated with each GC. I figured
>>>>> we can use
>>>>> the this GC id as a decoration for the HotSpot GC logging similarly
>>>>> to what we
>>>>> do with PrintGCTimeStamps and PrintGCDateStamps.
>>>>>
>>>>> A nice side effect of this is that it will make it easy to map an
>>>>> event in the
>>>>> event tracing to an entry in the GC log. Currently this is
>>>>> impossible to do
>>>>> deterministically since the logging and event tracing use different
>>>>> timing.
>>>>>
>>>>> I propose to add a flag called PrintGCID. Here's an example log
>>>>> entry for GC
>>>>> number 4 running with -XX:+PrintGCID:
>>>>>
>>>>> #4: [GC pause (G1 Evacuation Pause) (young) 185M->212M(502M),
>>>>> 0,2373420 secs]
>>>>>
>>>>> It is a decoration similar to the timestamps. Here's what it looks
>>>>> like if you
>>>>> add -XX:+PrintGCTimeStamps:
>>>>>
>>>>> 3,921: #4: [GC pause (G1 Evacuation Pause) (young)
>>>>> 185M->212M(502M), 0,2373420
>>>>> secs]
>>>>>
>>>>> And here's what it will look like if you add -XX:+PrintGCDateStamps:
>>>>>
>>>>> 2014-05-09T13:03:59.384+0200: 3,921: #4: [GC pause (G1 Evacuation
>>>>> Pause)
>>>>> (young) 185M->212M(502M), 0,2373420 secs]
>>>>>
>>>>>
>>>>> Here's a little longer example running ParallelGC:
>>>>>
>>>>> #0: [GC (Allocation Failure) 129024K->2980K(493056K), 0,1050244
> secs]
>>>>> #1: [GC (Allocation Failure) 132004K->3731K(622080K), 0,0389474
> secs]
>>>>> #2: [GC (Allocation Failure) 261779K->27153K(622080K), 0,2057294
> secs]
>>>>> #3: [GC (Allocation Failure) 285201K->224350K(880128K), 0,2410144
>>>>> secs]
>>>>> #4: [Full GC (Ergonomics) 224350K->219198K(1154560K), 2,7104481
> secs]
>>>>> #5: [GC (Allocation Failure) 735294K->638073K(863744K), 0,7707819
>>>>> secs]
>>>>>
>>>>> Note that for the concurrent GCs this gives a good way of seeing the
>>>>> concurrent
>>>>> action. Here is a G1 example:
>>>>>
>>>>> #0: [GC pause (G1 Evacuation Pause) (young) 24M->2881K(502M),
>>>>> 0,0383157 secs]
>>>>> #1: [GC pause (G1 Evacuation Pause) (young) 48M->4745K(502M),
>>>>> 0,0412220 secs]
>>>>> #2: [GC pause (G1 Evacuation Pause) (young) 136M->9618K(502M),
>>>>> 0,0426144 secs]
>>>>> #3: [GC pause (G1 Evacuation Pause) (young) 248M->143M(502M),
>>>>> 0,1467633 secs]
>>>>> #4: [GC pause (G1 Evacuation Pause) (young) 210M->199M(502M),
>>>>> 0,1627465 secs]
>>>>> #5: [GC pause (G1 Evacuation Pause) (young) 213M->235M(502M),
>>>>> 0,2376728 secs]
>>>>> #6: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
>>>>> 255M->269M(1004M),
>>>>> 0,2623779 secs]
>>>>> #7: [GC concurrent-root-region-scan-start]
>>>>> #7: [GC concurrent-root-region-scan-end, 0,0158488 secs]
>>>>> #7: [GC concurrent-mark-start]
>>>>> #8: [GC pause (G1 Evacuation Pause) (young) 315M->330M(2008M),
>>>>> 0,3007405 secs]
>>>>> #9: [GC pause (G1 Evacuation Pause) (young) 422M->418M(3212M),
>>>>> 0,4063937 secs]
>>>>> #10: [GC pause (G1 Evacuation Pause) (young) 564M->572M(4176M),
>>>>> 0,7500609 secs]
>>>>> #11: [GC pause (G1 Evacuation Pause) (young) 760M->756M(4946M),
>>>>> 1,5464884 secs]
>>>>> #7: [GC concurrent-mark-end, 3,9464599 secs]
>>>>> #7: [GC remark, 0,0240462 secs]
>>>>> #7: [GC cleanup 801M->800M(4946M), 0,0108146 secs]
>>>>> #7: [GC concurrent-cleanup-start]
>>>>> #7: [GC concurrent-cleanup-end, 0,0000228 secs]
>>>>> #12: [GC pause (G1 Evacuation Pause) (young) 977M->978M(5562M),
>>>>> 2,2464423 secs]
>>>>>
>>>>> Notice how the complete concurrent cycle is decorated with id #7.
>>>>> That makes it
>>>>> possible to follow the concurrent work more easily and grep it out
> etc.
>>>>>
>>>>> Here's a similar thing for CMS where the concurrent cycle #22 has a
>>>>> young GC #23
>>>>> in the middle of it:
>>>>>
>>>>> #21: [GC (Allocation Failure) 1621551K->1107977K(2395632K),
>>>>> 1,4558792 secs]
>>>>> #22: [GC (CMS Initial Mark) 1336515K(2452584K), 0,1107938 secs]
>>>>> #23: [GC (Allocation Failure) 1768969K->1184500K(2452584K),
>>>>> 0,7909961 secs]
>>>>> #22: [GC (CMS Final Remark) 1304871K(2452584K), 0,2353840 secs]
>>>>>
>>>>>
>>>>> I think this is very helpful information to have, so I would like to
>>>>> push this
>>>>> with the new flag PrintGCID as true by default. I'm sure some
>>>>> parsers will
>>>>> break, but it is an easy work around to run with -XX:-PrintGCID to
>>>>> get the old
>>>>> format. It should even be pretty easy to do post processing on a log
>>>>> file to
>>>>> remove the GC id information. Having it on by default makes it
>>>>> possible for us
>>>>> to use this information and in particular be able to map it to the
>>>>> JFR events we
>>>>> see in MissionControl.
>>>>>
>>>>> I would also like to backport this to 8u40, but in that case with
>>>>> the default
>>>>> value of PrintGCID as false.
>>>>>
>>>>> Thanks,
>>>>> Bengt
More information about the hotspot-dev
mailing list