RFR (L): 8046148: JEP 158 Unified JVM Logging

Kirk Pepperdine kirk.pepperdine at gmail.com
Tue Sep 15 07:32:08 UTC 2015


Hi Loi,

> 
> I am not sure if human readable text would necessarily mean unparsable text. I've written many many log parsing scripts using regexp matching.

Well, if we looked at a regex’s needed to parse the examples below it would be… (don’t forget the localization difference between European and North American number formats)
\\[(\\d+(?:.|,))s\\]\\[(.+)\\]\\[and now it gets messy for a bit because regex doesn’t handle repeated fields all that well...\\](.+)

So messy could be (.+), extract that capture group and then split it, trim out spaces or something like that. Looks ok unless there are square braces in the tags this looks better than before. My only question is, where would dates show up? People like dates as it helps them correlate events to other bits of information they might have in other logs. If we add that for a regular DateTime format and if you allow for one or not the other you get…

\\[(?:(\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d{3}[\\+|\\-]\\d{4}): )?(-?\\d+(?:\\.|,)\\d+” + “)]\\[(.+)\\]\\[and now it gets messy for a bit\\](.+)

and so on… I don’t mind if it remains consistent. However, my experience is that people will inject arbitrary changes (that is changes with no real added value). You don’t really see this if you are supporting tooling for your own in-house applications because you pick up the change, and then move on. If you are supporting multiple version of the JVM then it gets interesting.

> 
> In order to write a good parser, I need to understand what's in the log. It's much easier if the log is human readable, like
> 
> size = 1234 bytes, speed = 5678 ms
> 
> rather than
> 
> 1234,5678
> 
> UL allows a mixture of several types of logs (e.g., GC and class loading). I don't know how this can be represented in a CSV file.

And indeed you don’t want a system that does the equivalent of a pretty print to stdout. It is the programming equivalent of dumpster diving. It prevents one from being able to make reasonable decisions that increase the information density in logs. I say this because I’ll repeat more than 60% of the applications I run into currently have performance issues that are related to how logging as been implemented. In ~40% of the applications I tune the primary problem is logging. My hope was that UL would not add to the problem but I fear it will. My problem with size = xxxx bytes, speed = xxxx ms is an exceptionally information sparse format. And I would argue that you only need to be told one what this represent 1887488K->1887488K(1887488K). It’s verbose enough. So, having a single header size in bytes, speed in ms followed by lines of xxxx,xxxx is a reasonable compromise.
> 
> My criteria for a good log is:
> 
> + human readable
> + consistent
> + easy to parse

+good information density.

Kind regards,
Kirk



More information about the hotspot-dev mailing list