<!DOCTYPE html><html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body><div style="font-family: sans-serif;"><div class="markdown" style="white-space: normal;">

<p dir="auto">Full disclosure up front:  I dislike output formats which are 99% parseable, but fail to design for full disambiguation of all outputs.  We have some of this sort of technical debt in UL, and we should just fix it. That way tool vendors will stop bumping into this kind of thing.</p>

<p dir="auto">I’m going to give a number of opinions toward this goal.  As a group, they are the best way I know (in this moment) to comprehensively fix all the parsing problems.  Many of the opinions align with present UL realities (happily) and I hope we can adjust remaining UL realities to reach 100% unambiguous parsing.</p>

<p dir="auto">A. Decorators must be delimited in such a way that they cannot be confused with each other or with the following message line.</p>

<p dir="auto">A1. Therefore, decorator text must never contain the end-decorator character.</p>

<p dir="auto">A1a. For most robust, simple parsing, decorator text should not contain any other relevant delimiter character:  Begin decorator, begin message, newline.  (Yes, allowing newline would still allow decorators to be parsed but imagine the problems.  Just forbid any of “[] \n”.)</p>

<p dir="auto">A1b. To avoid off-by-one problems (and also do a good deed for multi-line outputs), decorator text must also be non-empty.  So no “[][2s] hello!”</p>

<p dir="auto">A2. Therefore, message text must never begin with the begin-decorator character.</p>

<p dir="auto">A2a. Message text should begin with its own delimiter, not just “any character other than begin-decorator”.  We use space today; good; this lets simple word-splitting isolate the message (as long as decorators cannot contain “ “).</p>

<p dir="auto">B. Message text should be terminated by a newline, but should not be subject to any other parsing requirement.  Once you split at the first space, you have your message line, with no further decoding.</p>

<p dir="auto">B1. Thus, the occasional introduction of doubled backslashes and backslash-newline is a bad idea.  It just introduces more ambiguities.  (“Which grammar was I parsing? Oh, THAT one!?”)</p>

<p dir="auto">B2. If message text contains embedded newlines, they should be unambiguously marked, so that the newline that terminates the whole message can be found.</p>

<p dir="auto">C. In the setting of UL, the best way to mark a continuation line is to vary the syntax at the BEGINNING of the following line, not the END of the preceding line.  This is because UL already has heavy parsing activity at the beginnings of lines; there is no good reason to add more parsing activity elsewhere.</p>

<p dir="auto">C1. The format for a continuation line should be some decorator-like syntax that is not exactly legal as a real decorator, and so cannot be confused with it.  Something like “[] second line” or “[ ] second line” or the like.  If it were “[ ]” (begin-decorator, space, end-decorator) then a buggy line-split that was forgetting to look for continuation lines would produce “] second line” as the message, which is a good clue about what went wrong.</p>

<p dir="auto">D. UL lines, along with their associated continuation lines, should never interrupt each other.  Concurrent output should be arranged so that each line (with its continuation lines) precedes or follows (does not interrupt) a neighboring lines (along with THEIR associated continuation lines).</p>

<p dir="auto">D1. If continuation lines are very difficult to keep with their leading UL lines, then we should consider adjusting the syntax to allow decorations which help match up a line with its continuations.  This seems to require an ID number, which ideally be given a characteristic syntax distinct from other decorators.  Something like “[#1]” and with “#” illegal for other decorators (see A1a above).</p>

<p dir="auto">E. UL is designed both human readers and mechanical parsers.  The above points support mechanical parsers, including very simple ones, and do not impair human readers either.</p>

<p dir="auto">Examples (without ID numbers):</p>

<pre style="margin-left: 15px; margin-right: 15px; padding: 5px; background-color: #F7F7F7; border-radius: 5px 5px 5px 5px; overflow-x: auto; max-width: 90vw;"><code style="margin: 0; border-radius: 3px; background-color: #F7F7F7; padding: 0px;">[foo][bar] this is the first line

[ ] and this is the second

[ ] and this is the third

 [not a decorator] this line has no decorators, and stands alone

 this is the first of two, again without decorators

[ ] this is the promised second

</code></pre>

<p dir="auto">Note there is no “\n” or “\”.  Those complicate parsers and are hard to read by humans as well.</p>

<p dir="auto">With ID numbers (which link together multi-line messages):</p>

<pre style="margin-left: 15px; margin-right: 15px; padding: 5px; background-color: #F7F7F7; border-radius: 5px 5px 5px 5px; overflow-x: auto; max-width: 90vw;"><code style="margin: 0; border-radius: 3px; background-color: #F7F7F7; padding: 0px;">[foo][bar][#42] this is the first line

[ ][#42] and this is the second

 [not a decorator] this line has no decorators, and stands alone

 [#99]this is the first of two, again without decorators

[ ][#99] this is the promised second

[ ][#42] and this is the third (for the first line; it got lost in concurrency)

</code></pre>

<p dir="auto">Here are some possible regexes:</p>

<pre style="margin-left: 15px; margin-right: 15px; padding: 5px; background-color: #F7F7F7; border-radius: 5px 5px 5px 5px; overflow-x: auto; max-width: 90vw;"><code style="margin: 0; border-radius: 3px; background-color: #F7F7F7; padding: 0px;">    // Regexes to recognize and strip decorations.

    public static final String DECORATOR_CHAR = "[^] \n]";

    public static final String ONE_DECORATION = "\\[(" + DECORATOR_CHAR + "+)\\]";

    public static final String DECORATION_PREFIX = "\\A(" + ONE_DECORATION + ")* ?";

    public static final String FIRST_DECORATION = "\\A" + ONE_DECORATION;

    public static final String SEQUENCE_ID = "\\[#[0-9]+\\]";

    public static final String CONTINUATION_PREFIX = "\\A\\[ \\](" + SEQUENCE_ID + ")? ?";

</code></pre>

<p dir="auto">For simplicity the syntax allows decorators which look like sequence IDs, but they should not be emitted, unless they really are sequence IDs.</p>

<p dir="auto">Test code: <a href="https://cr.openjdk.org/~jrose/scripts/LogStripTest.java.txt" style="color: #3983C4;">https://cr.openjdk.org/~jrose/scripts/LogStripTest.java.txt</a></p>

<p dir="auto">I hope this helps.  Thanks for working on this stuff, it’s important.</p>

<p dir="auto">— John</p>

<p dir="auto">On 7 Oct 2024, at 6:50, Anton Seoane Ampudia wrote:</p>

</div><blockquote class="embedded" style="margin: 0 0 5px; padding-left: 5px; border-left: 2px solid #777777; color: #777777;"><div id="87CDFE52-2443-498E-ABE5-8ED527E48934"><style scoped="">

@font-face <!-- {

  font-family: "Cambria Math";

  panose-1: 2 4 5 3 5 4 6 3 2 4;}

@font-face {

  font-family: Aptos;

  panose-1: 2 11 0 4 2 2 2 2 2 4;}

a:link,

span.MsoHyperlink {

  mso-style-priority: 99;

  color: #467886;

  text-decoration: underline;

}

</style>


<div lang="en-SE" link="#467886" vlink="#96607D" style="word-wrap:break-word">

<div class="WordSection1" style="page: WordSection1;">

<p class="MsoNormal"><span lang="ES">Hi all,</span></p>

<p class="MsoNormal"><span lang="ES"> </span></p>

<p class="MsoNormal"><span lang="EN-US">During the migration of compiler logs to the UnifiedLogging framework, I have observed that multiline logging does not include decorators for all the lines, instead only adding them for the first one and leaving the rest “dangling”. I have found out that this is already a reported issue in <a href="https://bugs.openjdk.org/browse/JDK-8288298">JDK-8288298</a>, and written a tentative fix for it.</span></p>

<p class="MsoNormal"><span lang="EN-US"> </span></p>

<p class="MsoNormal"><span lang="EN-US">Some initial testing has been yielding insignificant performance changes with normal logging use cases, but before going forward with it I would like to request for comments and opinions on this. As far as I know, it would simplify somewhat “manual reading” of logs, as everything starts right now in the same column, as well as automated parsing as there would be no line ambiguities. Copying from the JBS description:</span></p>

<p class="MsoNormal"><span lang="EN-US"> </span></p>

<p class="MsoNormal"><span lang="EN-US">> log_info(gc)("A\nB"); currently outputs:</span></p>

<p class="MsoNormal"><span lang="EN-US">> [0s][gc] A</span></p>

<p class="MsoNormal"><span lang="EN-US">> B</span></p>

<p class="MsoNormal"><span lang="EN-US">> And after this change will output:</span></p>

<p class="MsoNormal"><span lang="EN-US">> [0s][gc] A</span></p>

<p class="MsoNormal"><span lang="EN-US">> [1s][gc] B</span></p>

<p class="MsoNormal"><span lang="EN-US">></span></p>

<p class="MsoNormal"><span lang="EN-US">> This change allows UL to be parsed by regex. Example for per-line parsing:</span></p>

<p class="MsoNormal"><span lang="EN-US">></span></p>

<p class="MsoNormal"><span lang="EN-US">> ^\[ [^\[\]]* \] \[ [^\[\]]* \] (\[ [^\[\]]* \])?</span></p>

<p class="MsoNormal"><span lang="EN-US"> </span></p>

<p class="MsoNormal"><span lang="EN-US">It is worth mentioning that the special case with -Xlog:foldmultilines=true is not affected by this (i.e., if foldmultilines is set to true we do not carry out the line-by-line decorating).</span></p>

<p class="MsoNormal"><span lang="EN-US"> </span></p>

<p class="MsoNormal"><span lang="EN-US">Thanks,</span></p>

<p class="MsoNormal"><span lang="EN-US">Antón</span></p>

</div>

</div></div></blockquote>

<div class="markdown" style="white-space: normal;">


</div></div></body>


</html>

-->