RFR (M): JDK-8043607: Add a GC id as a log decoration similar to PrintGCTimeStamps
Bengt Rutisson
bengt.rutisson at oracle.com
Thu Jun 5 11:47:01 UTC 2014
Hi Erik,
Good suggestions!
Updated webrev:
http://cr.openjdk.java.net/~brutisso/8043607/webrev.03/
Webrev with only the changes compared to the last one:
http://cr.openjdk.java.net/~brutisso/8043607/webrev.02-03.diff/
Thanks,
Bengt
On 2014-06-05 11:43, Erik Helin wrote:
> Hi Bengt,
>
> On Wednesday 04 June 2014 17.06.06 Bengt Rutisson wrote:
>> Hi Erik,
>>
>> On 5/28/14 2:29 PM, Erik Helin wrote:
>>> Hi Bengt,
>>>
>>> just a first quick general comment: would it make sense for GCId to be
>>> in its own file or class, so the GC logging don't have to depend on
>>> the GC trace framework?
>> Good point.
>>
>> Here is an updated webrev where I separeted out GCId to its own file:
>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.02/
>>
>> Here's just the diff compared to the earlier version:
>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.01-02.diff/
> Looks much better! A couple of suggestions:
>
> - How about shortening the names a little bit for the static methods?
> - GCId::create()
> - GCId::peek()
>
> Since the methods are static, we must always prefix the methods with
> GCId, so this will be as explicit as the current version.
>
> - Instead of using the global static variable UNSET_GCID, what about
> having a third static method, undefined():
> - public static GCId GCId::undefined()
>
> We could also implement operator== and operator!= so that comparing
> two GCIds becomes a little easier.
>
> What do you think?
>
> Thanks,
> Erik
>
>> Jesper, the updated webrev also contains the spelling mistake you
>> noticed. Thanks for catching that!
>>
>> Bengt
>>
>>> Thanks,
>>> Erik
>>>
>>> On Friday 23 May 2014 07.00.44 Bengt Rutisson wrote:
>>>> Hi Jesper and everyone else,
>>>>
>>>> After explaining the problem with GenCollectedHeap in my last email I
>>>> realized that a different approach would be to just assume that the next
>>>> GC id will be the right one when we start the logging before we have
>>>> incremented the GC id. I think that is a nicer way of doing it than
>>>> "wasting" a GC id as I did before.
>>>>
>>>> Here's an updated webrev with that changed:
>>>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.01/
>>>>
>>>> Here's the diff compared to the last webrev:
>>>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00-01-diff/
>>>>
>>>> Thanks,
>>>> Bengt
>>>>
>>>> On 5/22/14 9:40 AM, Bengt Rutisson wrote:
>>>>> Hi Jesper,
>>>>>
>>>>> On 5/21/14 7:44 PM, Jesper Wilhelmsson wrote:
>>>>>> Hi Bengt,
>>>>>>
>>>>>> Good change! Thanks for doing this!
>>>>> Thanks for looking at it.
>>>>>
>>>>>> A few thoughts:
>>>>>>
>>>>>> * There are a few places that passes both a GCTimer and now a GCId.
>>>>>> Would it make sense to pass an info-object of some kind that contains
>>>>>> misc info that is used for logging/tracing? I'm not suggesting that
>>>>>> you should implement that in this change, just a thought for the
>>> future.
>>>
>>>>> Yes, I think that may be a good future cleanup to do. But would be
>>>>> good as a separate change, just as you point out.
>>>>>
>>>>>> * In genCollectdHeap.cpp you increase the gc id only if
>>>>>> PrintGCDetails is enabled. I assume this is the only place where a gc
>>>>>> id is used when running collectors that use
>>>>>> GenCollectedHeap::do_collection(). If not, the id created and used in
>>>>>> other places would be different depending on if PrintGCDetails was
>>>>>> enabled or not.
>>>>> Yes, this is a work around for the strange way the logging for
>>>>> GenCollectedHeap works. We do completely different timing and logging
>>>>> for PrintGC and PrintGCDetails. That the logging is different is just
>>>>> strange. That the the timing is different is in my opinion a bug. The
>>>>> pause time reported is shorter if you run with PrintGC than if you run
>>>>> with PrintGCDetails.
>>>>>
>>>>> This mostly affect CMS. The logging for CMS has several other issues
>>>>> as well. I added one comment about that in the bug report. So, I think
>>>>> it would be good to give the CMS logging an overhaul, but that should
>>>>> definitely be a separate change :) And maybe we should hold off with
>>>>> that until the unified logging framework is in place.
>>>>>
>>>>>> Anyway, looks good!
>>>>> Thanks!
>>>>>
>>>>> Bengt
>>>>>
>>>>>> Ship it!
>>>>>> /Jesper
>>>>>>
>>>>>> Bengt Rutisson skrev 21/5/14 11:31:
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Can I have some reviews for this patch:
>>>>>>> http://cr.openjdk.java.net/~brutisso/8043607/webrev.00/
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8043607
>>>>>>>
>>>>>>> This is mostly a GC change but I am sending it to the broader list
>>>>>>> since I added
>>>>>>> a method to ostream.hpp/cpp. We had this code duplicated in many
>>>>>>> places in the
>>>>>>> GC code:
>>>>>>>
>>>>>>> gclog_or_tty->date_stamp(PrintGCDateStamps);
>>>>>>> gclog_or_tty->stamp(PrintGCTimeStamps);
>>>>>>>
>>>>>>> And with my patch I had to add one more section to this code
>>>>>>> duplication.
>>>>>>> Instead I wrapped all three calls in a method called gclog_stamp().
>>>>>>> It is a bit
>>>>>>> hard to know where to place that method. ostream seems like the
>>>>>>> wrong place for
>>>>>>> GC specific code, but there is already lots of GC specific code
>>>>>>> there and even
>>>>>>> gclog_or_tty is declared there. So I added the method there for now:
>>>>>>>
>>>>>>> void outputStream::gclog_stamp(uint gc_id) {
>>>>>>>
>>>>>>> date_stamp(PrintGCDateStamps);
>>>>>>> stamp(PrintGCTimeStamps);
>>>>>>> if (PrintGCID) {
>>>>>>>
>>>>>>> print("#%u: ", gc_id);
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> Other than that this is a pure GC change.
>>>>>>>
>>>>>>>
>>>>>>> Background:
>>>>>>>
>>>>>>> With event based tracing we introduced a GC id field on all GC
>>>>>>> events. This id
>>>>>>> is a constantly increasing number associated with each GC. I figured
>>>>>>> we can use
>>>>>>> the this GC id as a decoration for the HotSpot GC logging similarly
>>>>>>> to what we
>>>>>>> do with PrintGCTimeStamps and PrintGCDateStamps.
>>>>>>>
>>>>>>> A nice side effect of this is that it will make it easy to map an
>>>>>>> event in the
>>>>>>> event tracing to an entry in the GC log. Currently this is
>>>>>>> impossible to do
>>>>>>> deterministically since the logging and event tracing use different
>>>>>>> timing.
>>>>>>>
>>>>>>> I propose to add a flag called PrintGCID. Here's an example log
>>>>>>> entry for GC
>>>>>>> number 4 running with -XX:+PrintGCID:
>>>>>>>
>>>>>>> #4: [GC pause (G1 Evacuation Pause) (young) 185M->212M(502M),
>>>>>>> 0,2373420 secs]
>>>>>>>
>>>>>>> It is a decoration similar to the timestamps. Here's what it looks
>>>>>>> like if you
>>>>>>> add -XX:+PrintGCTimeStamps:
>>>>>>>
>>>>>>> 3,921: #4: [GC pause (G1 Evacuation Pause) (young)
>>>>>>> 185M->212M(502M), 0,2373420
>>>>>>> secs]
>>>>>>>
>>>>>>> And here's what it will look like if you add -XX:+PrintGCDateStamps:
>>>>>>>
>>>>>>> 2014-05-09T13:03:59.384+0200: 3,921: #4: [GC pause (G1 Evacuation
>>>>>>> Pause)
>>>>>>> (young) 185M->212M(502M), 0,2373420 secs]
>>>>>>>
>>>>>>>
>>>>>>> Here's a little longer example running ParallelGC:
>>>>>>>
>>>>>>> #0: [GC (Allocation Failure) 129024K->2980K(493056K), 0,1050244
>>> secs]
>>>
>>>>>>> #1: [GC (Allocation Failure) 132004K->3731K(622080K), 0,0389474
>>> secs]
>>>
>>>>>>> #2: [GC (Allocation Failure) 261779K->27153K(622080K), 0,2057294
>>> secs]
>>>
>>>>>>> #3: [GC (Allocation Failure) 285201K->224350K(880128K), 0,2410144
>>>>>>> secs]
>>>>>>> #4: [Full GC (Ergonomics) 224350K->219198K(1154560K), 2,7104481
>>> secs]
>>>
>>>>>>> #5: [GC (Allocation Failure) 735294K->638073K(863744K), 0,7707819
>>>>>>> secs]
>>>>>>>
>>>>>>> Note that for the concurrent GCs this gives a good way of seeing the
>>>>>>> concurrent
>>>>>>> action. Here is a G1 example:
>>>>>>>
>>>>>>> #0: [GC pause (G1 Evacuation Pause) (young) 24M->2881K(502M),
>>>>>>> 0,0383157 secs]
>>>>>>> #1: [GC pause (G1 Evacuation Pause) (young) 48M->4745K(502M),
>>>>>>> 0,0412220 secs]
>>>>>>> #2: [GC pause (G1 Evacuation Pause) (young) 136M->9618K(502M),
>>>>>>> 0,0426144 secs]
>>>>>>> #3: [GC pause (G1 Evacuation Pause) (young) 248M->143M(502M),
>>>>>>> 0,1467633 secs]
>>>>>>> #4: [GC pause (G1 Evacuation Pause) (young) 210M->199M(502M),
>>>>>>> 0,1627465 secs]
>>>>>>> #5: [GC pause (G1 Evacuation Pause) (young) 213M->235M(502M),
>>>>>>> 0,2376728 secs]
>>>>>>> #6: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
>>>>>>> 255M->269M(1004M),
>>>>>>> 0,2623779 secs]
>>>>>>> #7: [GC concurrent-root-region-scan-start]
>>>>>>> #7: [GC concurrent-root-region-scan-end, 0,0158488 secs]
>>>>>>> #7: [GC concurrent-mark-start]
>>>>>>> #8: [GC pause (G1 Evacuation Pause) (young) 315M->330M(2008M),
>>>>>>> 0,3007405 secs]
>>>>>>> #9: [GC pause (G1 Evacuation Pause) (young) 422M->418M(3212M),
>>>>>>> 0,4063937 secs]
>>>>>>> #10: [GC pause (G1 Evacuation Pause) (young) 564M->572M(4176M),
>>>>>>> 0,7500609 secs]
>>>>>>> #11: [GC pause (G1 Evacuation Pause) (young) 760M->756M(4946M),
>>>>>>> 1,5464884 secs]
>>>>>>> #7: [GC concurrent-mark-end, 3,9464599 secs]
>>>>>>> #7: [GC remark, 0,0240462 secs]
>>>>>>> #7: [GC cleanup 801M->800M(4946M), 0,0108146 secs]
>>>>>>> #7: [GC concurrent-cleanup-start]
>>>>>>> #7: [GC concurrent-cleanup-end, 0,0000228 secs]
>>>>>>> #12: [GC pause (G1 Evacuation Pause) (young) 977M->978M(5562M),
>>>>>>> 2,2464423 secs]
>>>>>>>
>>>>>>> Notice how the complete concurrent cycle is decorated with id #7.
>>>>>>> That makes it
>>>>>>> possible to follow the concurrent work more easily and grep it out
>>> etc.
>>>
>>>>>>> Here's a similar thing for CMS where the concurrent cycle #22 has a
>>>>>>> young GC #23
>>>>>>> in the middle of it:
>>>>>>>
>>>>>>> #21: [GC (Allocation Failure) 1621551K->1107977K(2395632K),
>>>>>>> 1,4558792 secs]
>>>>>>> #22: [GC (CMS Initial Mark) 1336515K(2452584K), 0,1107938 secs]
>>>>>>> #23: [GC (Allocation Failure) 1768969K->1184500K(2452584K),
>>>>>>> 0,7909961 secs]
>>>>>>> #22: [GC (CMS Final Remark) 1304871K(2452584K), 0,2353840 secs]
>>>>>>>
>>>>>>>
>>>>>>> I think this is very helpful information to have, so I would like to
>>>>>>> push this
>>>>>>> with the new flag PrintGCID as true by default. I'm sure some
>>>>>>> parsers will
>>>>>>> break, but it is an easy work around to run with -XX:-PrintGCID to
>>>>>>> get the old
>>>>>>> format. It should even be pretty easy to do post processing on a log
>>>>>>> file to
>>>>>>> remove the GC id information. Having it on by default makes it
>>>>>>> possible for us
>>>>>>> to use this information and in particular be able to map it to the
>>>>>>> JFR events we
>>>>>>> see in MissionControl.
>>>>>>>
>>>>>>> I would also like to backport this to 8u40, but in that case with
>>>>>>> the default
>>>>>>> value of PrintGCID as false.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bengt
More information about the hotspot-dev
mailing list