RFR: JDK-8288624: Cleanup CommentHelper.getText0 [v4]

Hannes Wallnöfer hannesw at openjdk.org
Tue Jul 12 16:07:53 UTC 2022


On Tue, 12 Jul 2022 14:43:59 GMT, Jonathan Gibbons <jjg at openjdk.org> wrote:

>> Please review a moderately simple fix to clean up (as in _remove_!) `CommentHelper.getText` and friends/overloads.
>> 
>> This is moderately simple, because most of the heavy lifting was done in 
>> [JDK-8288699](https://bugs.openjdk.org/browse/JDK-8288624), to clean up `commentTagsToContent`.
>> 
>> The uses of `CommentHelper.getText` can generally be replaced by either `commentTagsToContent` or just `DocTree.toString()`.
>> 
>> Two bugs were uncovered as a result of the cleanup.  These are described in detail in a comment with screenshots in the [bug report](https://bugs.openjdk.org/browse/JDK-8288624?focusedCommentId=14508488&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14508488)  
>> 
>> Fixing the `see-list-long` bug was a direct reason to cleanup `commentTagsToContent`. The fix here could maybe be further improved by writing a simple visitor or (preferably) a pattern-switch when that is a standard feature of the language.
>> 
>> Fixing the other bug was mostly an accidental side-effect of just using `commentTagsToContent` instead of `CommentHelper.getText`, since the tags now get interpreted instead of ignored.  However, one tweak was necessary.
>> The doc comments for serialization info end up in `serialized-form.html` and not in the primary file for the enclosing type.
>> This means they should not undergo the standard `redirectRelativeLinks` treatment. Links using `{@link...}` are not affected, but links using explicit `<a href="relative-link">...</a>` are affected. Ideally, we should not be using such relative links in the JDK API documentation, but there are too many to change/fix as part of this work. The fix, for now, is to add a new overload to `commentTagsToContent` that provides the ability to disable the call to `redirectRelativeLinks` when needed ... that is, when generating `serialized-form.html`.
>> 
>> Initially, the goal was just a cleanup fix with no change to tests. The work has been tested by comparing generated docs before and after this work. There are a number of instances of differences in the generated docs, but all are instances of the bugs described above ... either the `see-list-long` bug, or the change that inline doc comment tags are now interpretedin places where they were previously ignored.  All existing tests continue to work without modification; new tests have been added for the fixes for the bugs that were discovered in the course of this work.
>
> Jonathan Gibbons has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove dead code

Looks mostly good! The one thing I have doubts about is the regular expression for HTML entities.

src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/TagletWriterImpl.java line 361:

> 359:         String s = c.toString()
> 360:                 .replaceAll("<.*?>", "")            // ignore HTML
> 361:                 .replaceAll("\\&[a-z0-9]+;?", " ")  // entities count as a single character

I don't think the regex for entities is correct (or complete). Numeric entities contain a `#` (such as `Å`), entities can contain upper case characters (like `Ä`), and I don't think the semicolon is optional as this seems to suggest.

-------------

PR: https://git.openjdk.org/jdk/pull/9438


More information about the compiler-dev mailing list