RFR: 8354908: javac mishandles supplementary character in character literal
Naoto Sato
naoto at openjdk.org
Mon May 5 22:50:20 UTC 2025
On Mon, 5 May 2025 22:41:57 GMT, Naoto Sato <naoto at openjdk.org> wrote:
>> Some Unicode characters consist of two surrogates, i.e. two `char`s. And, such Unicode characters cannot be part of a char literal, as there's no way to represent them as a character literal. But, javac currently accepts code with such characters, and only puts the char, the high surrogate, into the literal, ignoring the second one.
>>
>> For example, the JDK 24 behavior is:
>>
>> $ cat /tmp/T.java
>> public class T {
>> public static void main(String... args) {
>> char c = '😊';
>> System.err.println(Integer.toHexString((int) c));
>> System.err.println(Character.isHighSurrogate(c));
>> }
>> }
>> $ java /tmp/T.java
>> d83d
>> true
>>
>>
>> But, in JDK 11, such literals have been rejected:
>>
>> $ java /tmp/T.java
>> /tmp/T.java:3: error: unclosed character literal
>> char c = '😊';
>> ^
>> /tmp/T.java:3: error: illegal character: '\ude0a'
>> char c = '😊';
>> ^
>> /tmp/T.java:3: error: unclosed character literal
>> char c = '😊';
>> ^
>> 3 errors
>> error: compilation failed
>>
>>
>> The proposal in this PR is to explicitly check for this case when scanning character literal, and produce explicit error when a multi-surrogate character is used. javac will produce an error like:
>>
>> $ java /tmp/T.java
>> /tmp/T.java:3: error: character literal contains more than one UTF-16 code point
>> char c = '😊';
>> ^
>> 1 error
>> error: compilation failed
>
> src/jdk.compiler/share/classes/com/sun/tools/javac/resources/compiler.properties line 698:
>
>> 696:
>> 697: compiler.err.illegal.char.literal.multiple.surrogates=\
>> 698: character literal contains more than one UTF-16 code point
>
> The error message is kind of vague. How about using "surrogate code point"?
> https://www.unicode.org/glossary/#surrogate_code_point
Or "UTF-16 code unit"?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/24964#discussion_r2074337356
More information about the compiler-dev
mailing list