Parsing logs (for an AOT Cache analyzer tool)

Sun Sep 21 22:29:04 UTC 2025

On 9/19/25 1:14 AM, María Arias de Reyna Dominguez wrote:
> Hi!
>
> As I mentioned yesterday, I am working on a tool (interactive console) 
> to analyze what is inside the AOT cache, why and when the elements 
> were added (or not), and if there's anything that can be done to 
> improve it.
>
> It can be found here: https://github.com/Delawen/leyden-analyzer
> Warning: very much work in progress, I am changing the way the 
> commands work almost everyday as I add more commands and more data and 
> I don't like how it is shown :)
>
The logs in HotSpot generally have two (sometimes overlapping) purposes:

- For HotSpot developers to debug the implementation

- For users to gain insight about what the JVM is doing.

  I think many of the logs in the former group won't be very interesting 
to most users. One example would be the memory ranges you quoted below.

> But when analysing logs I found out there are several cases in which 
> it is difficult to parse it automatically. I am using a consumer that 
> goes line by line, and sometimes you need some context to know what is 
> happening. A very clear example:
>
> [info][aot       ] Allocating RW objects ...
> [info][aot       ] done (218321 objects)
> [info][aot       ] Allocating RO objects ...
> [info][aot       ] done (432657 objects)
>
These can be fixed by combining the output into:

[info][aot       ] Allocated 218321 RW objects)
[info][aot       ] Allocated 432657 RO objects)

I think this particular case probably won't be very useful to the end 
user. It might be better for your tool to parse -Xlog:aot+map and give 
both summary views (how many objects) and detailed views (info about 
each object, or groups of objects, etc).

Also, we could use -Xlog:aot+map as the main gateway for displaying 
information to the user. For example, we could add a summary section to 
count the number of all objects, the number of objects for each type, etc).

> I guess there are not many parallel things happening at this time on 
> the JVM, but if any other log message gets in between, that would be 
> chaotic. A human may get it, a machine will find it confusing.
>
> Also, there are some lines that can be parsed, but need "special 
> treatment" like for example this line that has a comma inside the 
> content of a comma-separated list of values:
>
> [info][aot       ] Class  CP entries = 127257, archived =  20941 ( 
> 16,5%), reverted =      0
>
In this particular case, the 16,5% is produced by printing the number as 
%.2d. In some locales, the decimal point is the "," character, while in 
other locales (such as US, it's printed as ".").

> Then there are other inconsistencies that are not that problematic but 
> fixing them could make parsing the log easier. For example, see the 
> following lines, which have similar information but displayed on very 
> different ways:
>
> [info][aot       ] Reserved output buffer space at 0x00007f5702e00000 
> [1084227584 bytes]
> [info][aot] Reserved archive_space_rs [0x0000000057000000 - 
> 0x000000005c000000] (83886080) bytes (includes protection zone)
> [info][aot] Reserved class_space_rs   [0x000000005c000000 - 
> 0x000000009c000000] (1073741824) bytes
> [info][aot] Mapped static  region #0 at base 0x0000000057001000 top 
> 0x0000000058fbe000 (ReadWrite)
> [info][aot       ] Heap range = [0x00000000e0000000 - 0x0000000100000000]
> [info][aot       ] Shared file region (rw) 0: 31818032 bytes, addr 
> 0x0000000800001000 file offset 0x00001000 crc 0xc67c8575
>
> In my opinion, it would make sense to have a common way of writing 
> region addresses so the parser only needs to implement one way of 
> parsing it. And this was a very obvious case, but I'm sure there are 
> others out there that would benefit from some guidelines on how to 
> output data.
>
> I intend to improve the log messages to make it easier to parse (while 
> not breaking the human-readable side) following suggestions from 
> https://cr.openjdk.org/~jrose/jvm/parsing-logs.html which I found very 
> complete.
>
> Do we have a "good-practices guideline for OpenJDK developers" on how 
> to write log messages? If not, do I start one? Where?
>
> Should I add new log messages instead of modifying the existing ones 
> in case someone is already parsing them? As an intermediate step 
> before "deprecating" the current messages.
>
> Some of the things I already have in mind:
>  - Better "CSV-style" lists of data
>  - Try to keep context in the same line (if you read a line alone, you 
> should understand it)
>  - Be more consistent in using "=" or ":" when specifying values (like 
> "[info][aot] Core region alignment: 4096" versus "Selected 
> AOTMode=record because AOTCacheOutput is specified")
>  - Be more consistent in general with similar type of data and similar 
> messages
>
CSV styles would be easier to parse but would be harder to read (and 
harder to generate as you'd need to worry about quoting the comma character.

Overall, I think we need to decide what information is useful for the 
user, and then only change those logs (if necessary) for better parsing. 
We probably don't want to change all the existing logs (there are too 
many of them).

Thanks

- Ioi

> What do you think?
>
> Cheers!
> María Arias de Reyna Domínguez
> Senior Software Engineer
> She / Her / Hers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20250921/ca60738a/attachment.htm>