RFR: 8225671: Support HTML 5 character references in javadoc
Jonathan Gibbons
jonathan.gibbons at oracle.com
Wed Jun 12 20:40:07 UTC 2019
Hannes,
Since time is growing short, this version is OK/approved if you do not
think it worth while to compress the space any.
-- Jon
On 6/12/19 1:33 PM, Jonathan Gibbons wrote:
> Hannes,
>
> A more compact representation would be two tables, one for
> single-character entities and the other for multi-character entities?
>
> Is that worth considering? I guess that until we have value types, we
> would still have to box the single-character ones, but a Character
> should still be smaller than a String, right?
>
> -- Jon
>
> On 6/12/19 1:10 PM, Hannes Wallnöfer wrote:
>> Please review:
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8225671
>> Webrev: http://cr.openjdk.java.net/~hannesw/8225671/webrev.00/
>>
>> This is the second attempt at supporting HTML 5 entities after
>> JDK-8222318 had to be reverted.
>>
>> Fortunately I didn’t have to keep the HTML 4 entities around after
>> all as I had assumed, I just got confused very thoroughly by the test
>> output.
>>
>> Given the huge increase in number of entities I decided to switch
>> from an enum to a plain class with a static Map. Entity values are
>> now stored as strings since some entities require dual codepoints.
>> Also, we do not need to use the reverse table anymore for lookup of
>> numeric entities, as HTML 5 has a concise definition of valid numeric
>> entities [1].
>>
>> [1]: https://www.w3.org/TR/html52/syntax.html#character-references
>>
>> I updated the test with entities from all relevant groups (new valid
>> named and numeric entities, invalid entities from control characters,
>> surrogates, and non-characters). I also tested these manually using
>> the W3 HTML validator [2]. Mach4 tier 1 tests also do pass.
>>
>> [2]: https://validator.w3.org/
>>
>> Hannes
More information about the javadoc-dev
mailing list