RFR: 8225671: Support HTML 5 character references in javadoc
Hannes Wallnöfer
hannes.wallnoefer at oracle.com
Wed Jun 12 20:10:31 UTC 2019
Please review:
JBS: https://bugs.openjdk.java.net/browse/JDK-8225671
Webrev: http://cr.openjdk.java.net/~hannesw/8225671/webrev.00/
This is the second attempt at supporting HTML 5 entities after JDK-8222318 had to be reverted.
Fortunately I didn’t have to keep the HTML 4 entities around after all as I had assumed, I just got confused very thoroughly by the test output.
Given the huge increase in number of entities I decided to switch from an enum to a plain class with a static Map. Entity values are now stored as strings since some entities require dual codepoints. Also, we do not need to use the reverse table anymore for lookup of numeric entities, as HTML 5 has a concise definition of valid numeric entities [1].
[1]: https://www.w3.org/TR/html52/syntax.html#character-references
I updated the test with entities from all relevant groups (new valid named and numeric entities, invalid entities from control characters, surrogates, and non-characters). I also tested these manually using the W3 HTML validator [2]. Mach4 tier 1 tests also do pass.
[2]: https://validator.w3.org/
Hannes
More information about the javadoc-dev
mailing list