RFR: JDK-8236048: Cleanup use of Utils.normalizeNewlines

Jonathan Gibbons jjg at openjdk.org
Tue Aug 16 21:32:34 UTC 2022


On Mon, 8 Aug 2022 10:03:05 GMT, Hannes Wallnöfer <hannesw at openjdk.org> wrote:

>> Please review a medium-size cleanup related to handling newlines in `Content`/`HtmlTree`/`Text[Builder]` trees.
>> This is the 3rd and last in the recent series of cleanup tasks in the code to generate HTML.
>> 
>> Until recently, the `Text`/`TextBuilder` objects were "ready-for-writing", meaning that characters like `<`, `&`, etc were encoded as entities, and newlines were represented by the platform line separator. This caused difficulties when counting the size of generated text, since these characters had to be detected and accounted for.  A recent change addressed entities, but stayed clear of the newline issue, which is now addressed here.
>> 
>> A related performance problem is that the method `Utils.normalizeNewlines` always created new strings, even when it was not necessary.
>> 
>> With this change, newlines in `Text`, `TextBuilder` and other subtypes of `Content` containing text are always represented by `\n`, and are converted to the platform line separator as needed. 
>> 
>> I note the following assumptions:
>> 
>> * the input may contain any flavor of newlines in the user-provided documentation comments
>> * the output should aways use the platform line separator (`\n` on Linux and macOS, `\r\n` on Windows etc.)
>> 
>> Together, these imply that some amount of the input will need to be scanned to normalize newlines at some point in the process. While it would be nice to not have to normalize newlines, it's a case or "pay now, or pay later". 
>> 
>> Notable parts of this change:
>> 
>> * `Utils.normalizeNewlines` is moved to be a static method in `Text` and rewritten to not create new strings unless needed. This allows a number of questionable imports of `toolkit.utils.DocletConstants` to be removed from the `html.markup` package.  While it would be possible to call this method automatically on all strings passed to `Text`, `TextBuilder` etc, in many cases it is not necessary: put another way, it only needs to be called on strings that have come from the user, either in documentation comments or in command-line options.
>> * `Content.write`, and the implementations thereof, are given a new parameter which gives the newline sequence to use. This is set to `\n` for `toString` and the platform separator when writing to a file.
>> * `DocletConstants.NL` goes away and is superseded by:
>>    *  `Text.NL`: always `\n`, to be used when adding text to a `Content` tree (this replaces most uses of `DocletConstants.NL`)
>>    * `DocFile.PLATFORM_LINE_SEPARATOR`: the platform line separator, to be used when writing to files
>> * The various `charCount` methods no longer have to worry about use of the platform line separator 
>> * Assertions are used to verify that there are no `\r` characters added into `Text`/`TextBuilder`/`RawHtml`/`Content`
>> 
>> Other cleanup:
>> * `DocletConstants.DEFAULT_TAB_STOP_LENGTH` is moved to `BaseOptions` which is the only class that uses it. This leaves just two remaining constants in `DocletConstants` which should probably be moved elsewhere as well, and the `DocletConstants` class deleted. That is a step too far for the work here.
>> * Minor IDE-suggestions, for lambdas, enhanced switch, adding braces, etc
>> 
>> One test was affected by the `DocletConstants.NL` change, but apart from that, this is a pure-cleanup change, with no (intentional) externally visible effect.
>
> src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/markup/HtmlDocument.java line 80:
> 
>> 78:         writer.write(docType.text);
>> 79:         writer.write(newline);
>> 80:         docContent.write(writer, newline,true);
> 
> Missing space after comma

Fixed.  
This is a mildly annoying side-effect of IDEA wanting to label boolean arguments: it makes the missing spaces much harder to see.

> src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/markup/Text.java line 51:
> 
>> 49:      */
>> 50:     public static Text of(CharSequence content) {
>> 51:         assert checkNewlines(content);
> 
> Redundant assert as private constructor also contains it.

Fixed

-------------

PR: https://git.openjdk.org/jdk/pull/9691


More information about the javadoc-dev mailing list