RFR: 8358066: Non-ascii package names gives compilation error "import requires canonical name"

Mon Jun 2 18:06:53 UTC 2025

On Sat, 31 May 2025 21:05:35 GMT, Archie Cobbs <acobbs at openjdk.org> wrote:

> A simple counting bug in `Convert.utfNumChars()` causes bogus compiler errors for `import` statements of non-ASCII class names when the compiler is configured to use one of the older UTF-8 based `Name` table implementations (e.g., by specifying the `-XDuseUnsharedTable=true` flag).

Overall, looks sensible. Comments for consideration inline.

src/jdk.compiler/share/classes/com/sun/tools/javac/util/Convert.java line 231:

> 229:             off += nbytes;
> 230:             len -= nbytes;
> 231:             numChars++;

I wonder if it wouldn't be easier to simply ignore bytes from `buf` in the form of `0b10xxxxxx`? E.g. something along these lines:
Suggestion:

            int byte1 = buf[off++];
            if ((byte1 & 0b11000000) != 0b10000000) {
                //only count the first byte in every encoded sequence,
                //and ignore the other:
                numChars++;
            }

test/langtools/tools/javac/nametable/TestUtfNumChars.java line 42:

> 40: 
> 41:         // This is the string "ab«cd≤ef🟢gh"
> 42:         String s = "ab\u00ABcd\u2264ef\ud83d\udd34gh";

Nit: not sure if there's a strong reason to use escapes in the string literal, esp. given the Unicode characters are used in the comment above. Given https://github.com/openjdk/jdk/pull/24574 is integrated, I would say, use UTF-8 in the string literal, and drop the comment?

-------------

PR Review: https://git.openjdk.org/jdk/pull/25567#pullrequestreview-2889427355
PR Review Comment: https://git.openjdk.org/jdk/pull/25567#discussion_r2121824627
PR Review Comment: https://git.openjdk.org/jdk/pull/25567#discussion_r2121820096