RFR: 8288298: Resolve multiline message parsing ambiguities in UL
Johan Sjölen
jsjolen at openjdk.org
Tue Oct 29 09:23:14 UTC 2024
On Mon, 28 Oct 2024 05:29:18 GMT, David Holmes <dholmes at openjdk.org> wrote:
>> Tools such as JITWatch parse OpenJDK logs (e.g. logs generated from `LogCompilation`) in order to present interesting data to users. In order for these tools to work reliably, UnifiedLogging (UL) needs to have a consistent output scheme.
>>
>> Currently, a UL log message can be formatted in such a way that it looks like a UL decorator prefix, causing issues with parsing. This is because logging functions in UL do not prepend decorators (or any kind of prefix) to newlines in a log message. For example, `log_info(gc)("A\nB");` currently outputs
>>
>> [0s][gc] A
>> B
>>
>> and we could mistakenly interpret `B` as a decorator. Additionally, developers may introduce pseudo-decorators (something that looks like a decorator but is actually part of the log message), yielding an incorrect parse. As a side remark, but also relevant, we hinder human readability when logs appear suddenly skewed at some points.
>>
>> The UL framework should offer a robust way to (a) distinguish decorators from messages, and (b) unambiguously group multiline output. This PR aims to achieve both goals through a subset of the changes proposed in [this mail](https://mail.openjdk.org/pipermail/hotspot-dev/2024-October/095284.html), from which the rest of the ideas can be done as future work. This means:
>>
>> - Decorators cannot contain the symbols `[` or `]`. The special decorator `[ ]` (with a variable number of whitespace between the brackets) is reserved.
>> - We separate decorators from log messages via the first space after a closing bracket in the line.
>> - A log message can contain any kind of symbols, and ends with a newline (except in the case for multiline messages).
>> - We prepend multiline logging (such as in the example above) with the invalid decorator `[ ]`. The invalid decorator is as wide as the indentation of the rest of the log for easy visual reading. For example:
>>
>> [0s][gc] single-line message
>> [1s][gc] another single line message
>> [2s][gc] first line of a multiline message
>> [ ] second and last line of a multiline message
>> [3s][gc] another single line message
>>
>> Note how this is both unambiguously parseable and human readable.
>>
>> For the case where decorators have been disabled, the aforementioned points do not apply (i.e., behaviour remains the same as before). This means no multiline logical connection (such as the one presented above) and no way to separate decorators (the empty set, in this case) from messages. This is intentional as users specifying no ...
>
> src/hotspot/share/logging/logFileStreamOutput.cpp line 174:
>
>> 172: // Handle multiline strings: split the string replacing newlines with terminators,
>> 173: // and then force write_internal_line to print all of them (i.e. not stopping at the
>> 174: // first null but until msg_len bytes are printed)
>
> The string duplication with newline-to-nul conversion likely performs okay for typical short logging statements. But my initial approach would have been to scan through the string looking for the next newline and print it in chunks that way.
This might be the better way to go, with `strstr`.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21712#discussion_r1820413607
More information about the hotspot-runtime-dev
mailing list