RFR: 8335122: Reorganize internal low-level support for HTML in jdk.javadoc [v4]

Hannes Wallnöfer hannesw at openjdk.org
Fri Jul 26 10:54:38 UTC 2024


On Wed, 24 Jul 2024 22:09:46 GMT, Jonathan Gibbons <jjg at openjdk.org> wrote:

>> Please review a change to reorganize the internal low-level support for HTML in the jdk.javadoc module.
>> 
>> Hitherto, there are two separate sets of classes for low-level support for HTML in the `jdk.javadoc` module: one, in doclint, focused on reading and checking classes, the other, in the standard doclet, focused on generating HTML. This PR merges those two sets, into a new package `jdk.javadoc.internal.html` that is now used by both `doclint` and the standard doclet.
>> 
>> There was a naming "anti-clash" -- `HtmlTag` in `doclint` vs `TagName` in the standard doclet. The resolution is to use `HtmlTag`, since the merged class is more than just the tag name.
>> 
>> A few minor bugs were found and fixed.   Other minor cleanup was done, but otherwise, there should be no big surprises here. But, one small item of note: `enum HtmlStyle` was split into `interface HtmlStyle` and `enum HtmlStyles implements HtmlStyle` to avoid having a doclet-specific enum class in the new `internal.html` package.  The naming follows `HtmlId` and `HtmlIds`.
>> 
>> There is no attempt at this time to simplify `HtmlTag` and `HtmlAttr` to remove support for older versions of HTML.
>
> Jonathan Gibbons has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Cleanup use of HtmlStyle and HtmlStyles

src/jdk.javadoc/share/classes/jdk/javadoc/internal/html/HtmlTag.java line 87:

> 85:             attrs(AttrKind.HTML4, CLEAR)),
> 86: 
> 87:     BUTTON(BlockType.OTHER, EndKind.REQUIRED,

Several tag constants that use `BlockType.OTHER` in this enum are defined as [Phrasing Content](https://html.spec.whatwg.org/#phrasing-content) in the HTML5 spec. Since HTML5 phrasing content roughly corresponds to pre-HTML5 inline content these tags should use `BlockType.INLINE` here. This includes the following tags:

 - BUTTON
 - INPUT
 - LABEL
 - LINK
 - SCRIPT

These tags were also flagged as `phrasingContent` in the old doclet `TagName` enum. I'm not sure whether marking it as `INLINE` content will break DocLint tests.

It would seem like a good idea to suggest using [HTML5 content categories](https://developer.mozilla.org/en-US/docs/Web/HTML/Content_categories) in the new merged code, but the new categories are more complex and overlapping, and don't include list and table content, so there is not a lot to gain besides maybe more up-to-date terminology.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19916#discussion_r1692888920


More information about the compiler-dev mailing list