RFR: JDK-8236048: Cleanup use of Utils.normalizeNewlines

Jonathan Gibbons jjg at openjdk.org
Tue Aug 16 21:39:16 UTC 2022


On Mon, 8 Aug 2022 10:15:41 GMT, Hannes Wallnöfer <hannesw at openjdk.org> wrote:

>> Please review a medium-size cleanup related to handling newlines in `Content`/`HtmlTree`/`Text[Builder]` trees.
>> This is the 3rd and last in the recent series of cleanup tasks in the code to generate HTML.
>> 
>> Until recently, the `Text`/`TextBuilder` objects were "ready-for-writing", meaning that characters like `<`, `&`, etc were encoded as entities, and newlines were represented by the platform line separator. This caused difficulties when counting the size of generated text, since these characters had to be detected and accounted for.  A recent change addressed entities, but stayed clear of the newline issue, which is now addressed here.
>> 
>> A related performance problem is that the method `Utils.normalizeNewlines` always created new strings, even when it was not necessary.
>> 
>> With this change, newlines in `Text`, `TextBuilder` and other subtypes of `Content` containing text are always represented by `\n`, and are converted to the platform line separator as needed. 
>> 
>> I note the following assumptions:
>> 
>> * the input may contain any flavor of newlines in the user-provided documentation comments
>> * the output should aways use the platform line separator (`\n` on Linux and macOS, `\r\n` on Windows etc.)
>> 
>> Together, these imply that some amount of the input will need to be scanned to normalize newlines at some point in the process. While it would be nice to not have to normalize newlines, it's a case or "pay now, or pay later". 
>> 
>> Notable parts of this change:
>> 
>> * `Utils.normalizeNewlines` is moved to be a static method in `Text` and rewritten to not create new strings unless needed. This allows a number of questionable imports of `toolkit.utils.DocletConstants` to be removed from the `html.markup` package.  While it would be possible to call this method automatically on all strings passed to `Text`, `TextBuilder` etc, in many cases it is not necessary: put another way, it only needs to be called on strings that have come from the user, either in documentation comments or in command-line options.
>> * `Content.write`, and the implementations thereof, are given a new parameter which gives the newline sequence to use. This is set to `\n` for `toString` and the platform separator when writing to a file.
>> * `DocletConstants.NL` goes away and is superseded by:
>>    *  `Text.NL`: always `\n`, to be used when adding text to a `Content` tree (this replaces most uses of `DocletConstants.NL`)
>>    * `DocFile.PLATFORM_LINE_SEPARATOR`: the platform line separator, to be used when writing to files
>> * The various `charCount` methods no longer have to worry about use of the platform line separator 
>> * Assertions are used to verify that there are no `\r` characters added into `Text`/`TextBuilder`/`RawHtml`/`Content`
>> 
>> Other cleanup:
>> * `DocletConstants.DEFAULT_TAB_STOP_LENGTH` is moved to `BaseOptions` which is the only class that uses it. This leaves just two remaining constants in `DocletConstants` which should probably be moved elsewhere as well, and the `DocletConstants` class deleted. That is a step too far for the work here.
>> * Minor IDE-suggestions, for lambdas, enhanced switch, adding braces, etc
>> 
>> One test was affected by the `DocletConstants.NL` change, but apart from that, this is a pure-cleanup change, with no (intentional) externally visible effect.
>
> src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/markup/RawHtml.java line 161:
> 
>> 159:                             }
>> 160:                             count++;
>> 161:                             break;
> 
> The RawHtml constructor contains `assert Text.checkNewlines(rawHtml)` so I'm not sure this is actually needed.

I'm not sure what you are referring to here. Yes, there is the assertion about newlines, but you still have to scan the `rawHtml` for start and end elements, and entities.  Note there is no longer any special handling of newline characters (`\r`, `\n` etc)

-------------

PR: https://git.openjdk.org/jdk/pull/9691


More information about the javadoc-dev mailing list