RFR: 8358066: Non-ascii package names gives compilation error "import requires canonical name"
Jan Lahoda
jlahoda at openjdk.org
Mon Jun 2 18:06:53 UTC 2025
On Sat, 31 May 2025 21:05:35 GMT, Archie Cobbs <acobbs at openjdk.org> wrote:
> A simple counting bug in `Convert.utfNumChars()` causes bogus compiler errors for `import` statements of non-ASCII class names when the compiler is configured to use one of the older UTF-8 based `Name` table implementations (e.g., by specifying the `-XDuseUnsharedTable=true` flag).
Overall, looks sensible. Comments for consideration inline.
src/jdk.compiler/share/classes/com/sun/tools/javac/util/Convert.java line 231:
> 229: off += nbytes;
> 230: len -= nbytes;
> 231: numChars++;
I wonder if it wouldn't be easier to simply ignore bytes from `buf` in the form of `0b10xxxxxx`? E.g. something along these lines:
Suggestion:
int byte1 = buf[off++];
if ((byte1 & 0b11000000) != 0b10000000) {
//only count the first byte in every encoded sequence,
//and ignore the other:
numChars++;
}
test/langtools/tools/javac/nametable/TestUtfNumChars.java line 42:
> 40:
> 41: // This is the string "ab«cd≤ef🟢gh"
> 42: String s = "ab\u00ABcd\u2264ef\ud83d\udd34gh";
Nit: not sure if there's a strong reason to use escapes in the string literal, esp. given the Unicode characters are used in the comment above. Given https://github.com/openjdk/jdk/pull/24574 is integrated, I would say, use UTF-8 in the string literal, and drop the comment?
-------------
PR Review: https://git.openjdk.org/jdk/pull/25567#pullrequestreview-2889427355
PR Review Comment: https://git.openjdk.org/jdk/pull/25567#discussion_r2121824627
PR Review Comment: https://git.openjdk.org/jdk/pull/25567#discussion_r2121820096
More information about the compiler-dev
mailing list