JEP 271: Unified GC Logging - First pre-review

Wed Oct 28 15:07:55 UTC 2015

Hi Bengt,

> On Oct 28, 2015, at 1:07 AM, Bengt Rutisson <bengt.rutisson at oracle.com> wrote:
> 
> 
> Hi Kirk,
> 
> On 2015-10-27 15:37, Kirk Pepperdine wrote:
>> Hi Bengt,
>> 
>>> [1.165s][info   ][gc,cpu      ] GC#1 CPU Times: user=0.39s sys=0.00s real=0.40s
>> If the tag [gc,cpu] indicates this is CPU times could you not shorted the test to …] GC#1 user=0.39s sys=0.0s real=0.40s
>> 
>> It may sound like a trivial change as the reduction is only 11 bytes. But in the large, 11 bytes @ number of GC events starts to add up very quickly. As the expression says, you can fill a water tower one spoon full at a time.
> 
> Sure, the only problem is that decorations are optional. You can turn them off. On the other hand if you do that you have explicitly chosen to get less information.

Of course my use case for this data is to parse it and perform analytics on it. So my primary concern is how easy is it to parse and how accurate is the data. Those are the things I’ll fight you for ;-). The other things I’ll mention here I do consider important but unless they affect accuracy or make it difficult to parse…. For example, we’re not done with the M vs. K resolution point. The most common problems I see with (all) logging involve; unnecessarily bulky messages, rate at which messages are logged, and logging synchronization bottlenecks. As I mentioned before, these problem appear in more than half of the applications I work with. That they are not being broadly reported is an observability issue. These problems generally aren’t seen in small examples or in tests run on developers desktop. These are problems of aggregation that you only see at scale (hence the water tower analogy) or in messy/noisy production environments.

On the M bytes vs K vs B bytes resolution. Using B instead of M means you just print the raw value instead of forcing a divide. if you shift (to divide by 2^20) you truncate the value so the resolution is now close to that of a 1M region. That will introduce quite a bit of slop into any downstream calculations. With GC I don’t mind a bit of slop so K is ok but reducing resolution to M is a bit too much. That said, you said you’d fix that after the initial effort so I will remain quiet on this point for the moment.

On the bulk vs frequency issues. my point is, a GC log is not a novel, nor is it a work of art, it’s a GC log. :-) When I look at this log format I’m looking at it with an eye on parsing it. Thus I’l looking at the line, pulling out the data that I will use and noting the data that I’ll simply skip over because it’s redundant. Generally, all of the numbers will be pulled and text will be used to provide context. So, I am pointing out text in the messages that I don’t need to determine context. In my opinion, removing this extraneous or redundant information helps ease GC loggings contribution to the bulk issue. Combining lines helps with GC loggings contribution to the frequency issue. In the small world of GC logging these are not big problems. However, GC logging is but one of the things that will use UL so it’s my opinion that being overly verbose does matter. My vested interest is to have people turn this on and use it which means logging overheads will matter. That said, I’m not going to “fight" you for adding extra text here or there. I’ll just program my parser to ignore it and hope that it doesn’t send applications over a tipping point.

> 
> I'm fine with keeping or removing "CPU Times". Let's see if anyone else has any opinion.

I’m fine with keeping it also. That said, having essentially the same information in the tags and the payload seems wasteful and/or and ineffective use of tags. I’d either use the tag (my preference in this case) or the payload but not both. From a rate of information, of the 46 bytes in payload, 11 bytes are effectively redundant implying ~20% of the message is noise.

So maybe this is helpful. When I look at this line I see the tag info as adding no value. However gc,cpu is important as it provides context. I don’t trust the time stamp but then it’s redundant because the information in the log line is part of the pervasively logged GC event. In that context the GC#1 maybe helpful but most likely I’ll ignore it. I’ve already commented on CPU Times:  and of course the rest of the payload is useful. In this message there are 20 bytes of data in the 78 bytes present. This message is ~75% overhead. Indeed for 1 message who cares.. but...

Regards,
Kirk

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20151028/59736010/signature.asc>