RFR: JDK-8288699: cleanup HTML tree in HtmlDocletWriter.commentTagsToContent
Hannes Wallnöfer
hannesw at openjdk.org
Tue Jun 28 17:02:45 UTC 2022
On Sun, 19 Jun 2022 23:58:26 GMT, Jonathan Gibbons <jjg at openjdk.org> wrote:
> Please review a particularly satisfying review of a cleanup for
> `HtmlDocletWriter.commentTagsToContent`, and the use of `RawHtml`.
>
> The background for this is a separate cleanup for `CommentHelper.getText`,
> during which it was observed that the code could be eliminated, with some
> uses being replaced by `HtmlDocletWriter.commentTagsToContent`. That led
> to more tags being processed, which in turn led to the observation that
> `Content.charCount` was often giving incorrect results.
>
> The primary root cause of the problem with `Content.charCount` was the
> handling of `StartElementTree` in the visitor in `commentTagsToContent`.
> The problem is that code there creates a sequence of text nodes for the
> attributes, preceded by a `RawHtml` object containing `<TAGNAME` and followed
> by another separate `RawHtml` node containing just `>`. The net result
> is that `charCount` is mistaken into including the size of the intervening
> text nodes in the total, because it does not recognize the "unbalanced" `<`
> and `>` in the surrounding nodes.
>
> The general system fix for `commentTagsToContent` is to change the _visit..._
> methods to always append to the `Content` object supplied as a parameter, instead
> of to a fixed local variable (`result`) in the enclosing scope. This allows
> `visitStartElement` to accumulate the attributes in a temporary buffer,
> and then create a single `RawHtml` node as the translation of the
> `StartElementTree`. Changing the methodology of the visitors also
> required disentangling the processing of `{@docRoot}`, which generally
> always appears inside the `href` attribute, typically followed by
> the rest of the URL. The complication there is that since forever there
> has been special handling of `{@docRoot}/..` which can be replaced by
> the value of the `--docrootparent` option. In reviewing that part of the code,
> it may help to just ignore the old code and just review the new replacement
> code.
>
> Starting with the code in `commentTagsToContent`, it became worthwhile to
> use factory methods on `RawHtml` for stylized uses of the class, for _start
> element_, _end element_, and _comment_. These all have a `charCount` of zero,
> so it only becomes necessary to compute the character count when given
> arbitrary text.
>
> Related, the `Text` and `TextBuilder` classes are changed so that they
> contain unescaped content, and only escape the HTML characters when they
> are written out. This allows their `charCount` to simply return the `length()`
> of the string content, instead of having to parse the content to account
> of any entities that may be present. This also means that newline characters
> will be counted; previously, they were not.
>
> Another noteworthy change. Somewhat usually, some resource strings for the
> description of mandated methods, such as for enum and record classes, contain
> simple HTML (`<i>...</i>`), which is modelled in a single `TextTree` and
> subsequently converted to a non-standard use of `RawHtml`. To avoid
> significantly changing the resource strings, the solution is to parse the
> resources into a suitable `List<DocTree>`, using a relatively simple regular
> expression. The goal is to _not_ extend the support any further than that
> provided.
>
> Finally, one more change/bug fix. We do not claim to support HTML CDATA
> sections, but there is one in the JDK API docs (in `coll-index.html`).
> It "mostly worked" by accident, but dropped the leading `<`, which "broke"
> the section in the generated output. I've put code into `DocCommentParser`
> to handle CDATA sections, representing them (for now) as a `Text` node
> containing just the section itself. This is then handled specially up in
> `commentTagsToContent`, where it is translated to a new `RawHtml` node.
>
> For reviewing, I suggest starting with `RawHtml` and then `HtmlDocletWriter`.
> Many of the other changes are derivative changes, such as changing
> `new RawHtml(...)` to `RawHtml.of(...)` and similar other changes.
>
> All tests continue to pass without change.
>
> There are minor changes to the generated docs, in two flavors.
>
> 1. There is one instance, in `Collectors.html`, where the fixes to `charCount`
> are enough to change the separator after the type parameters in a
> method summary table from ` ` to `<br>`.
> See `AbstractMemberWriter` line 207. The current threshold is 10, and
> some of the methods in `Collectors.html` were less and are now more.
>
> 2. A number of files use an unescaped `>` in doc comments. These are now
> translated to `>` in the output. The change in behavior is because of
> changing to use `html.Text` nodes instead of `html.RawHtml` nodes to represent
> user-provided text in `TextTree` nodes.
> Previously, the generated output was inconsistent in this area: most
> instances used `>`, but user-provided `>` was passed through.
> Now, the output is consistent.
> FWIW, `tidy` prefers the use of `>` as well.
The changes look good, apart from a few minor nitpicks in the attribute handling code.
The only remaining issue I have is that I don't (yet) fully convinced the old `{@docRoot}` handling code does the same thing as the new one, so I'm still working on that.
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/HtmlDocletWriter.java line 1555:
> 1553: pendingDocRoot.accept(this, content);
> 1554: pendingDocRoot = null;
> 1555: first = false;
first is already false here and a few lines above since pendingDocRoot assignment below is followed by a `first = false` assignment.
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/HtmlDocletWriter.java line 1558:
> 1556: }
> 1557:
> 1558: if (dt instanceof TextTree tt){
Missing space before curly bracket.
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/HtmlDocletWriter.java line 1561:
> 1559: String text = tt.getBody();
> 1560: if (first && isHRef) {
> 1561: text = redirectRelativeLinks(element, (TextTree) dt);
Could use tt instead of casting again.
-------------
PR: https://git.openjdk.org/jdk/pull/9210
More information about the javadoc-dev
mailing list