[External] : Re: JDK-8268622 - Performance issues in javac `Name` class
Jonathan Gibbons
jonathan.gibbons at oracle.com
Mon Mar 6 20:50:05 UTC 2023
See also this:
https://en.wikipedia.org/wiki/Unicode_equivalence
-- Jon
On 3/5/23 3:12 PM, Archie Cobbs wrote:
> Hi Jon,
>
> Thanks for taking a look at the patch.
>
> On Fri, Mar 3, 2023 at 5:07 PM Jonathan Gibbons
> <jonathan.gibbons at oracle.com> wrote:
>
> I would give you inline code comments, except that it's not a PR
> yet. I note that I generally distrust the `getMessage` for any
> exception for which the message is not formally specified in some
> way ... in other words, don't assume that `e.getMessage()` by
> itself is interesting.
>
>
> That makes sense, and is easy to fix - thanks for the suggestion.
>
> Is it possible to write a test for the bug fix in PoolReader?
> What is an example of a name encoded in two different ways?
>
> In any multi-byte UTF-8 sequence, the bytes after the first are
> supposed to all look like 0x10xxxxxx. But the code is not checking
> that, so e.g., you could have 0x11xxxxxx instead and it would encode
> the same character but not match byte-for-byte. For example, è = c3
> a8, but Convert.java would also accept c3 e8 or c3 28 for "è".
>
> Because the Name hash tables store UTF-8 byte sequences, if the same
> Name were encoded two different ways, it would get added to the hash
> table twice.
>
> Another way this can happen is e.g. encoding a character as a 3-byte
> sequence when the character is actually small enough to fit in a
> 2-byte sequence. For example, e0 84 80 encodes character 0x0100, but
> it should really be encoded as c4 80.
>
> Thinking more about this, I think I should create a separate bug and
> patch for this particular problem. So, expect a digression on that next...
>
> Although conceptually simple, this is a significant change for a
> very low level data type. It would be worth doing more testing
> than just the usual langtools tests. For example, if you build JDK
> before and after this change, are the generated class files the same?
>
> Definitely a test worth doing.
>
> -Archie
>
> --
> Archie L. Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/compiler-dev/attachments/20230306/af4a588f/attachment.htm>
More information about the compiler-dev
mailing list