RFR: 8337111: Bad HTML checker for generated documentation [v2]

Nizar Benalla nbenalla at openjdk.org
Fri Dec 13 17:07:27 UTC 2024


On Mon, 2 Dec 2024 18:19:32 GMT, Nizar Benalla <nbenalla at openjdk.org> wrote:

>> Doccheck's human-generated reports are great at previewing a "chessboard" of results. Giving reader a quick glimpse at the quality/health of the documentation. But these tests needed to be automated and they didn't easily translate to something that can be integrated into a CI.
>> 
>> This PR includes an HTML and internal link test on `api/java.base` and a BadChars and Doctype test on the entire generated documentation bundle.
>> 
>> Here is an example of the output after running all tests on `api/java.base`
>> 
>> Note: There is an active PR to fix the broken anchors left in `java.base` so this is not a blocker.
>> 
>> 
>> 
>> STDOUT:
>> STDERR:
>> test: test
>> Tidy found errors in the generated HTML
>> /Users/nizarbenalla/Work/jdk-repos/jdk1/build/macosx-aarch64/images/docs/api/java.base/java/lang/Class.html:323:87: Warning: <a> anchor "nest" already defined
>> Tidy output end.
>> 
>> 
>> api/java.base/java/util/concurrent/StructuredTaskScope.ShutdownOnFailure.html:245: id not found: api/java.base/java/util/concurrent/StructuredTaskScope.ShutdownOnFailure.html#TreeStructure
>> api/java.base/java/util/concurrent/StructuredTaskScope.ShutdownOnSuccess.html:242: id not found: api/java.base/java/util/concurrent/StructuredTaskScope.ShutdownOnSuccess.html#TreeStructure
>> api/java.base/java/lang/Class.html:323: name already declared: nest
>> api/java.base/java/lang/Module.html:291: id not found: api/java.base/java/lang/foreign/package-summary.html#restricted
>> api/java.base/java/lang/Module.html:434: id not found: api/java.base/java/lang/foreign/package-summary.html#restricted
>> api/java.base/java/lang/foreign/MemorySegment.html:725: id not found: api/java.base/java/lang/foreign/package-summary.html#restricted
>> 
>> Link Checker Report
>> Checked 3446 files.
>> Found 445059 references to 48205 anchors in 5770 files and 64 other URIs.
>>      1 duplicate ids
>>      3 missing ids
>> 
>> Hosts
>>     20 docs.oracle.com
>>      1 tools.ietf.org
>>      1 www.ietf.org
>>      1 jcp.org
>>      4 www.rfc-editor.org
>>      7 unicode.org
>>     10 www.unicode.org
>>     20 www.w3.org
>> Exception running test test: java.lang.Exception: One or more HTML checkers failed: [java.lang.RuntimeException: Tidy found errors in the generated HTML, java.lang.RuntimeException: LinkChecker encountered errors. Duplicate IDs: 1, Missing IDs: 3, Missing Files: 0, Bad Schemes: 0]
>> java.lang.Exception: One or more HTML checkers failed: [java.lang.RuntimeException: Tidy found errors in the generated HTML, java.lang.Ru...
>
> Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - add file with all vetted links
>  - improve some parts based on review comments
>  - Merge remote-tracking branch 'upstream/master' into new-docs-tests-suit
>  - Merge remote-tracking branch 'upstream/master' into new-docs-tests-suit
>  - Convert parts of doccheck into tests

I think I'd like to go for warnings instead of errors for external links, at least for a little while to avoid unnecessary failures in CI.
Maybe until we let people know that they should add the external resources to the whitelist, or we setup GHA for doc tests.
I split the test categories into separate jtreg tests, I think we may get away with one test per category even if it's testing all modules + specs.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21879#issuecomment-2541876121


More information about the javadoc-dev mailing list