Parsing logs (for an AOT Cache analyzer tool)

Fri Sep 19 09:35:58 UTC 2025

Hi Maria,

For grouped messages, please look at using NonInterleavingLogStream instead of single-line messages. With JDK-8288298 having been integrated, these in tandem should be enough to resolve any ambiguities such as the one in the first example. I believe that the result will be

[info][aot       ] Allocating RW objects ...
[                      ] done (218321 objects)
[info][aot       ] Allocating RO objects ...
[                      ] done (432657 objects)

> Do we have a "good-practices guideline for OpenJDK developers" on how to write log messages? If not, do I start one? Where?

As far as I know, we have no such guide. If we actually do not have one, then my vote would be to put one into the docs/ folder in the JDK repo.

All the best,
Johan Sjölén

________________________________________
From: leyden-dev <leyden-dev-retn at openjdk.org> on behalf of María Arias de Reyna Dominguez <mariasde at redhat.com>
Sent: Friday, September 19, 2025 10:14
To: leyden-dev at openjdk.org
Subject: Parsing logs (for an AOT Cache analyzer tool)

Hi!

As I mentioned yesterday, I am working on a tool (interactive console) to analyze what is inside the AOT cache, why and when the elements were added (or not), and if there's anything that can be done to improve it.

It can be found here: https://github.com/Delawen/leyden-analyzer
Warning: very much work in progress, I am changing the way the commands work almost everyday as I add more commands and more data and I don't like how it is shown :)

But when analysing logs I found out there are several cases in which it is difficult to parse it automatically. I am using a consumer that goes line by line, and sometimes you need some context to know what is happening. A very clear example:

[info][aot       ] Allocating RW objects ...
[info][aot       ] done (218321 objects)
[info][aot       ] Allocating RO objects ...
[info][aot       ] done (432657 objects)

I guess there are not many parallel things happening at this time on the JVM, but if any other log message gets in between, that would be chaotic. A human may get it, a machine will find it confusing.

Also, there are some lines that can be parsed, but need "special treatment" like for example this line that has a comma inside the content of a comma-separated list of values:

[info][aot       ] Class  CP entries = 127257, archived =  20941 ( 16,5%), reverted =      0

Then there are other inconsistencies that are not that problematic but fixing them could make parsing the log easier. For example, see the following lines, which have similar information but displayed on very different ways:

[info][aot       ] Reserved output buffer space at 0x00007f5702e00000 [1084227584 bytes]
[info][aot] Reserved archive_space_rs [0x0000000057000000 - 0x000000005c000000] (83886080) bytes (includes protection zone)
[info][aot] Reserved class_space_rs   [0x000000005c000000 - 0x000000009c000000] (1073741824) bytes
[info][aot] Mapped static  region #0 at base 0x0000000057001000 top 0x0000000058fbe000 (ReadWrite)
[info][aot       ] Heap range = [0x00000000e0000000 - 0x0000000100000000]
[info][aot       ] Shared file region (rw) 0: 31818032 bytes, addr 0x0000000800001000 file offset 0x00001000 crc 0xc67c8575

In my opinion, it would make sense to have a common way of writing region addresses so the parser only needs to implement one way of parsing it. And this was a very obvious case, but I'm sure there are others out there that would benefit from some guidelines on how to output data.

I intend to improve the log messages to make it easier to parse (while not breaking the human-readable side) following suggestions from https://cr.openjdk.org/~jrose/jvm/parsing-logs.html which I found very complete.

Do we have a "good-practices guideline for OpenJDK developers" on how to write log messages? If not, do I start one? Where?

Should I add new log messages instead of modifying the existing ones in case someone is already parsing them? As an intermediate step before "deprecating" the current messages.

Some of the things I already have in mind:
 - Better "CSV-style" lists of data
 - Try to keep context in the same line (if you read a line alone, you should understand it)
 - Be more consistent in using "=" or ":" when specifying values (like "[info][aot] Core region alignment: 4096" versus "Selected AOTMode=record because AOTCacheOutput is specified")
 - Be more consistent in general with similar type of data and similar messages

What do you think?

Cheers!
María Arias de Reyna Domínguez
Senior Software Engineer
She / Her / Hers