RFR: 8288298: Resolve multiline message parsing ambiguities in UL [v4]

Johan Sjölen jsjolen at openjdk.org
Fri Nov 8 09:25:25 UTC 2024


On Thu, 31 Oct 2024 12:44:44 GMT, Antón Seoane <duke at openjdk.org> wrote:

>> Tools such as JITWatch parse OpenJDK logs (e.g. logs generated from `LogCompilation`) in order to present interesting data to users. In order for these tools to work reliably, UnifiedLogging (UL) needs to have a consistent output scheme.
>> 
>> Currently, a UL log message can be formatted in such a way that it looks like a UL decorator prefix, causing issues with parsing. This is because logging functions in UL do not prepend decorators (or any kind of prefix) to newlines in a log message. For example, `log_info(gc)("A\nB");` currently outputs
>> 
>> [0s][gc] A
>> B
>> 
>> and we could mistakenly interpret `B` as a decorator. Additionally, developers may introduce pseudo-decorators (something that looks like a decorator but is actually part of the log message), yielding an incorrect parse. As a side remark, but also relevant, we hinder human readability when logs appear suddenly skewed at some points.
>> 
>> The UL framework should offer a robust way to (a) distinguish decorators from messages, and (b) unambiguously group multiline output. This PR aims to achieve both goals through a subset of the changes proposed in [this mail](https://mail.openjdk.org/pipermail/hotspot-dev/2024-October/095284.html), from which the rest of the ideas can be done as future work. This means:
>> 
>> - Decorators cannot contain the symbols `[` or `]`. The special decorator `[ ]` (with a variable number of whitespace between the brackets) is reserved.
>> - We separate decorators from log messages via the first space after a closing bracket in the line.
>> - A log message can contain any kind of symbols, and ends with a newline (except in the case for multiline messages).
>> - We prepend multiline logging (such as in the example above) with the invalid decorator `[ ]`. The invalid decorator is as wide as the indentation of the rest of the log for easy visual reading. For example:
>> 
>> [0s][gc] single-line message
>> [1s][gc] another single line message
>> [2s][gc] first line of a multiline message
>> [      ] second and last line of a multiline message
>> [3s][gc] another single line message
>> 
>> Note how this is both unambiguously parseable and human readable.
>> 
>> For the case where decorators have been disabled, the aforementioned points do not apply (i.e., behaviour remains the same as before). This means no multiline logical connection (such as the one presented above) and no way to separate decorators (the empty set, in this case) from messages. This is intentional as users specifying no ...
>
> Antón Seoane has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Changed approach to avoid string duplication

LGTM, but please fix comment.

src/hotspot/share/logging/logFileStreamOutput.cpp line 135:

> 133:     // with each newline.
> 134:     const char* next = strstr(msg, "\n");
> 135:     while (next != NULL) {  // We have some newlines to print

Use nullptr, not NULL.

-------------

Marked as reviewed by jsjolen (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21712#pullrequestreview-2423172146
PR Review Comment: https://git.openjdk.org/jdk/pull/21712#discussion_r1833995922


More information about the hotspot-runtime-dev mailing list