RFR: JDK-8268335: Find better way to exclude empty HTML elements [v2]
Jonathan Gibbons
jjg at openjdk.java.net
Tue Feb 8 17:12:09 UTC 2022
On Fri, 21 Jan 2022 09:18:59 GMT, Hannes Wallnöfer <hannesw at openjdk.org> wrote:
>> Please review a change to straighten out the mechanism to avoid generating empty HTML elements in javadoc.
>>
>> Currently this is implemented using the `Content.isValid()` method in `HtmlTree.add(Content)` to check whether the the content argument should be added or dropped. This seems wrong since we don't really want to check for validity but non-emptiness (we want to drop empty elements, not print a warning or error).
>>
>> So the first decision was to get rid of the `Content.isValid()` method and use `Content.isEmpty()` instead. I considered keeping `isValid()` and using it to actually do some validation checks, but in the end decided it didn't make much sense and scrapped the method.
>>
>> My initial approach was to take care of emptiness at the content creation site, but it soon turned out there are too many sites potentially creating empty content. After fixing 40+ sites using a new `Content.addNonEmpty(Content)` method I still got empty elements that shouldn't be there in the JDK documentation.
>>
>> Therefore `HtmlTree.add(Content)` still checks for and drops empty content arguments. Instead, I added a new `HtmlTree.addUnchecked(Content)` method that adds the argument without the checks. The new method is only used 5 times in the whole codebase. It replaces the use of the special `HtmlTree.EMPTY` content token which is no longer needed.
>>
>> A few remarks about the rewritten `HtmlTree.isEmpty()` method: its purpose is to determine whether content is missing in an element that should have content. It therefore always returns `false` for elements that are expected or allowed to be without content. This is maybe a bit counter-intuitive, but it is what is needed. We could do a combined check such as `isVoid() || !isEmpty()` but that would still leave out elements that *can* be empty, such as `<script>`.
>>
>> The implementation of `HtmlTree.isEmpty()` is deliberately simple and conservative. For instance, it doesn't look at attributes to decide whether an element is allowed to be empty. The rationale for this is to make the behavior as predictable as possible and avoid surprises.
>>
>> Since we no longer have a validity check in `HtmlTree.add(Content)` I had to remove a few lines of code in `TestHtmlDocument.java` that expected invalid tags to be dropped. Otherwise there are no test changes, as this is a noreg-cleanup task. I did compare the javadoc JDK documention before and after this change, the output is unchanged.
>
> Hannes Wallnöfer has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
>
> - Merge branch 'master' into JDK-8268335
> - JDK-8268335: Remove unnecessary if statement
> - JDK-8268335: Find better way to exclude empty HTML elements
> A few remarks about the rewritten HtmlTree.isEmpty() method: its purpose is to determine whether content is missing in an element that should have content. It therefore always returns false for elements that are expected or allowed to be without content. This is maybe a bit counter-intuitive, but it is what is needed. We could do a combined check such as isVoid() || !isEmpty() but that would still leave out elements that can be empty, such as <script>.
I agree the current situation is not great, and needs revision, so thanks for taking this on.
While it may be hard to completely remove the creation and use of empty elements, we should at least try and reduce the number. The current state is a hangover from very-old javadoc, which used print-to-stream before even `HtmlTree`.
Having `isEmpty` return true for some empty elements is definitely counter-intuitive and suggests the name is wrong.
This seems better suited for `isValid` with a definition involving `isVoid` and `isEmpty` as you suggested. I don't quite understanding the comment about `<script>`. Maybe you are wanting a way to mark elements which should not be considered empty, as in "this space intentionally left blank". IIRC, that was the original purpose of `HtmlTree.EMPTY` so perhaps some sort of marker element can stay, perhaps with a better name, to mark HTML elements which should not be marked invalid because they appear to be empty.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7159
More information about the javadoc-dev
mailing list