RFR: 8288298: Resolve multiline message parsing ambiguities in UL
David Holmes
dholmes at openjdk.org
Mon Oct 28 06:01:32 UTC 2024
On Fri, 25 Oct 2024 12:27:21 GMT, Antón Seoane <duke at openjdk.org> wrote:
> Tools such as JITWatch parse OpenJDK logs (e.g. logs generated from `LogCompilation`) in order to present interesting data to users. In order for these tools to work reliably, UnifiedLogging (UL) needs to have a consistent output scheme.
>
> Currently, a UL log message can be formatted in such a way that it looks like a UL decorator prefix, causing issues with parsing. This is because logging functions in UL do not prepend decorators (or any kind of prefix) to newlines in a log message. For example, `log_info(gc)("A\nB");` currently outputs
>
> [0s][gc] A
> B
>
> and we could mistakenly interpret `B` as a decorator. Additionally, developers may introduce pseudo-decorators (something that looks like a decorator but is actually part of the log message), yielding an incorrect parse. As a side remark, but also relevant, we hinder human readability when logs appear suddenly skewed at some points.
>
> The UL framework should offer a robust way to (a) distinguish decorators from messages, and (b) unambiguously group multiline output. This PR aims to achieve both goals through a subset of the changes proposed in [this mail](https://mail.openjdk.org/pipermail/hotspot-dev/2024-October/095284.html), from which the rest of the ideas can be done as future work. This means:
>
> - Decorators cannot contain the symbols `[` or `]`. The special decorator `[ ]` (with a variable number of whitespace between the brackets) is reserved.
> - We separate decorators from log messages via the first space after a closing bracket in the line.
> - A log message can contain any kind of symbols, and ends with a newline (except in the case for multiline messages).
> - We prepend multiline logging (such as in the example above) with the invalid decorator `[ ]`. The invalid decorator is as wide as the indentation of the rest of the log for easy visual reading. For example:
>
> [0s][gc] single-line message
> [1s][gc] another single line message
> [2s][gc] first line of a multiline message
> [ ] second and last line of a multiline message
> [3s][gc] another single line message
>
> Note how this is both unambiguously parseable and human readable.
>
> For the case where decorators have been disabled, the aforementioned points do not apply (i.e., behaviour remains the same as before). This means no multiline logical connection (such as the one presented above) and no way to separate decorators (the empty set, in this case) from messages. This is intentional as users specifying no decorators expect "raw" output. Additionally, it i...
I think this proposed change needs a bit more "socialisation" as it would appear that any existing UL parsers that cope with current multi-line behaviour will be broken by this change. That isn't to say the proposed change is not a reasonable idea, but we may need to give an adequate notice period to see the impact on those existing parsers. To that end a CSR request for "change in behaviour" should be filed.
A few comments/queries below.
Thanks.
src/hotspot/share/logging/logFileStreamOutput.cpp line 119:
> 117: }
> 118:
> 119: int LogFileStreamOutput::write_internal_line(const LogDecorations& decorations, const char* msg, int msg_len) {
Suggestion:
int LogFileStreamOutput::write_internal_lines(const LogDecorations& decorations, const char* msg, int msg_len) {
Not sure we can't just fold this altogether in `write_internal`.
src/hotspot/share/logging/logFileStreamOutput.cpp line 133:
> 131: written += decorator_padding;
> 132: WRITE_LOG_WITH_RESULT_CHECK(jio_fprintf(_stream, "%s\n", msg), written_msg);
> 133: while (written_msg < msg_len) {
Some comments here about how this handles multi-lines, would be good. Something like:
// If there is still more to write out then we must have a multi-line message.
// If we have active decorators then pad the line with an empty decorator string so
// that the output lines up.
src/hotspot/share/logging/logFileStreamOutput.cpp line 174:
> 172: // Handle multiline strings: split the string replacing newlines with terminators,
> 173: // and then force write_internal_line to print all of them (i.e. not stopping at the
> 174: // first null but until msg_len bytes are printed)
The string duplication with newline-to-nul conversion likely performs okay for typical short logging statements. But my initial approach would have been to scan through the string looking for the next newline and print it in chunks that way.
src/hotspot/share/logging/logFileStreamOutput.cpp line 179:
> 177: return 0;
> 178: }
> 179: ALLOW_C_FUNCTION(::memcpy, ::memcpy(dupstr, msg, msg_len + 1);)
Why not use `os::strdup`?
-------------
PR Review: https://git.openjdk.org/jdk/pull/21712#pullrequestreview-2397960324
PR Review Comment: https://git.openjdk.org/jdk/pull/21712#discussion_r1818416183
PR Review Comment: https://git.openjdk.org/jdk/pull/21712#discussion_r1818403454
PR Review Comment: https://git.openjdk.org/jdk/pull/21712#discussion_r1818395576
PR Review Comment: https://git.openjdk.org/jdk/pull/21712#discussion_r1818395983
More information about the hotspot-runtime-dev
mailing list