RFR: JDK-8268335: Find better way to exclude empty HTML elements [v5]
Jonathan Gibbons
jjg at openjdk.java.net
Mon Mar 14 18:30:47 UTC 2022
On Tue, 22 Feb 2022 14:46:27 GMT, Hannes Wallnöfer <hannesw at openjdk.org> wrote:
>> Please review a change to straighten out the mechanism to avoid generating empty HTML elements in javadoc.
>>
>> Currently this is implemented using the `Content.isValid()` method in `HtmlTree.add(Content)` to check whether the the content argument should be added or dropped. This seems wrong since we don't really want to check for validity but non-emptiness (we want to drop empty elements, not print a warning or error).
>>
>> So the first decision was to get rid of the `Content.isValid()` method and use `Content.isEmpty()` instead. I considered keeping `isValid()` and using it to actually do some validation checks, but in the end decided it didn't make much sense and scrapped the method.
>>
>> My initial approach was to take care of emptiness at the content creation site, but it soon turned out there are too many sites potentially creating empty content. After fixing 40+ sites using a new `Content.addNonEmpty(Content)` method I still got empty elements that shouldn't be there in the JDK documentation.
>>
>> Therefore `HtmlTree.add(Content)` still checks for and drops empty content arguments. Instead, I added a new `HtmlTree.addUnchecked(Content)` method that adds the argument without the checks. The new method is only used 5 times in the whole codebase. It replaces the use of the special `HtmlTree.EMPTY` content token which is no longer needed.
>>
>> A few remarks about the rewritten `HtmlTree.isEmpty()` method: its purpose is to determine whether content is missing in an element that should have content. It therefore always returns `false` for elements that are expected or allowed to be without content. This is maybe a bit counter-intuitive, but it is what is needed. We could do a combined check such as `isVoid() || !isEmpty()` but that would still leave out elements that *can* be empty, such as `<script>`.
>>
>> The implementation of `HtmlTree.isEmpty()` is deliberately simple and conservative. For instance, it doesn't look at attributes to decide whether an element is allowed to be empty. The rationale for this is to make the behavior as predictable as possible and avoid surprises.
>>
>> Since we no longer have a validity check in `HtmlTree.add(Content)` I had to remove a few lines of code in `TestHtmlDocument.java` that expected invalid tags to be dropped. Otherwise there are no test changes, as this is a noreg-cleanup task. I did compare the javadoc JDK documention before and after this change, the output is unchanged.
>
> Hannes Wallnöfer has updated the pull request incrementally with one additional commit since the last revision:
>
> Introduce Content.isDiscardable()
Generally nice.
It is still weird to me that we might create empty placeholders and then throw them away, but given that we do, this seems like a better (and more explicit) solution than what we had before.
I also like the mildly-unrelated cleanup for HtmlTree.SPAN.
There is one question about why `HtmlTree.A` might reasonably be empty in HTML 5. The only reason I can think of is to declare an `id`, since an `href` with no content is unclickable. But a node with just an `id` and no content can apply to other nodes as well, can't it? e.g. `span`
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/SourceToHTMLConverter.java line 318:
> 316: HtmlIds.forLine(currentLineNo),
> 317: Text.of(utils.replaceTabs(line)));
> 318: pre.addUnchecked(anchor);
Why does this need to be `addUnchecked` ?
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/markup/HtmlTree.java line 1017:
> 1015: * Returns {@code true} if the HTML tree does not affect the output and can be discarded.
> 1016: * This implementation considers non-void elements without content as discardable, with the
> 1017: * exception of {@code SCRIPT} and {@code A} which can sometimes be used without content.
In HTML 5, when is it reasonable to have `A` without content?
-------------
PR: https://git.openjdk.java.net/jdk/pull/7159
More information about the javadoc-dev
mailing list