RFR: 8346118: Improve whitespace normalization in preformatted text

Hannes Wallnöfer hannesw at openjdk.org
Thu Mar 13 18:19:02 UTC 2025


On Wed, 5 Mar 2025 18:07:11 GMT, Jonathan Gibbons <jjg at openjdk.org> wrote:

>> Please review an enhancement to make `DocCommentParser` normalize whitespace inside `<pre>` elements. The normalization is conceptually simple and and intended to be minimally invasive. Before parsing, `DocCommentParser` checks whether the text is a traditional doc comment and whether every line starts with a space character, which is commonly the case in traditional doc comments. If so, a single leading space is removed in block content (top level text and `{@code}`/`{@literal}` tags) when parsing within HTML `<pre>` tags.
>> 
>> This fixes the incidental one-space indentation in the vast majority of JDK code samples using `<pre>` alone or in combination with `<code>` or `{@code}`. In fact, I only found one code sample in JDK code that isn't solved by this change, for which I included a fix in this PR (it's in `String.startsWith(String, int)`, where I replaced the 10 char indentation and trailing line with a `<blockquote>`). 
>> 
>> The many added `boolean inBlockContent` arguments pased around in `DocCommentParser` are to make sure the removal is not applied to multiline inline content, which is maybe a bit fussy considering there is not a lot of multiline inline content in `<pre>` tags and it usually would not mind about removal of a non-essential space character, but I wanted to keep the change minimal. There are few javadoc tests that had to be adapted, most of the testing is done in `test/langtools/tools/javac/doctree`. 
>> 
>> If the exact number of leading whitespace in `<pre>` tags is important to any javadoc user the old output can be restored by increasing the indentation by 1. There will be a release note for this of course. 
>> 
>> Unfortunately, there is another whitespace problem that can't be solved as easily, and that is a leading blank line caused by `<pre><code>\n` open tags. Browsers will [ignore a newline immediately following a `<pre>` tag][1], but not if there is a `<code>` tag in between. There are hundreds of occurrences of this in JDK code, including variants with space characters mixed in. The fix in javadoc proper would be too complex, so I decided to solve it with 3 lines of JavaScript and a regex to reverse the order of `<code>\n` at the beginning of `<pre>` tags while removing any intermediary space. Script operation is indiscernible and it solves the problem.
>> 
>> [1]: https://html.spec.whatwg.org/#the-pre-element:the-pre-element
>
> As you indicated, there are two problems being addressed here, which might indicate the need for two separate patches. These issues are:
> 
> 1. The leading 1-space problem.
> 2. The trailing newline-after-<pre> problem
> 
> For the first, it is unduly hard work to fix this just for `<pre>` blocks. I still think that an overall better long-term solution would be to apply a conceptual `stripIndent` to the entire doc comment. This would bring traditional comments into line with the new Markdown comments, and can be done in just a few lines in `DocCommentParser`, and doing it there in DCP means you need not update `Elements.getDocComment`. If nothing else, I would suggest doing the experiment and comparing the generated docs, to verify there are no unexpected side effects. If there are any significant unexpected side effects, then your approach might deserve a second look. You could also make this a JDK-version-specific change if you wanted: meaning the new behavior does not apply to older JDK versions, although that is not a policy we have adhered to in the past.
> 
> For the second, I just feel that is a step too far, using JavaScript to clean up what some might consider to be bad input. Authors should either write HTML according to the HTML (and CSS?) specs, so that `javadoc` is just a "pass-through" layer, or authors should use a suitable construct, like `{@snippet...}`, that is "pleasing" to look at in source form while still generating the desired output.

After discussion with @jonathan-gibbons we have agreed that the two issues in this PR should be handled separately.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23868#issuecomment-2722311586


More information about the compiler-dev mailing list