[External] : Re: JDK-8268622 - Performance issues in javac `Name` class

Jonathan Gibbons jonathan.gibbons at oracle.com
Mon Mar 6 20:50:05 UTC 2023


See also this:
https://en.wikipedia.org/wiki/Unicode_equivalence

-- Jon


On 3/5/23 3:12 PM, Archie Cobbs wrote:
> Hi Jon,
>
> Thanks for taking a look at the patch.
>
> On Fri, Mar 3, 2023 at 5:07 PM Jonathan Gibbons 
> <jonathan.gibbons at oracle.com> wrote:
>
>     I would give you inline code comments, except that it's not a PR
>     yet.  I note that I generally distrust the `getMessage` for any
>     exception for which the message is not formally specified in some
>     way ... in other words, don't assume that `e.getMessage()` by
>     itself is interesting.
>
>
>  That makes sense, and is easy to fix - thanks for the suggestion.
>
>     Is it possible to write a test for the bug fix in PoolReader?  
>     What is an example of a name encoded in two different ways?
>
> In any multi-byte UTF-8 sequence, the bytes after the first are 
> supposed to all look like 0x10xxxxxx. But the code is not checking 
> that, so e.g., you could have 0x11xxxxxx instead and it would encode 
> the same character but not match byte-for-byte. For example, è = c3 
> a8, but Convert.java would also accept c3 e8 or c3 28 for "è".
>
> Because the Name hash tables store UTF-8 byte sequences, if the same 
> Name were encoded two different ways, it would get added to the hash 
> table twice.
>
> Another way this can happen is e.g. encoding a character as a 3-byte 
> sequence when the character is actually small enough to fit in a 
> 2-byte sequence. For example, e0 84 80 encodes character 0x0100, but 
> it should really be encoded as c4 80.
>
> Thinking more about this, I think I should create a separate bug and 
> patch for this particular problem. So, expect a digression on that next...
>
>     Although conceptually simple, this is a significant change for a
>     very low level data type. It would be worth doing more testing
>     than just the usual langtools tests. For example, if you build JDK
>     before and after this change, are the generated class files the same?
>
> Definitely a test worth doing.
>
> -Archie
>
> -- 
> Archie L. Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/compiler-dev/attachments/20230306/af4a588f/attachment.htm>


More information about the compiler-dev mailing list