RFR: 8354908: javac mishandles supplementary character in character literal
Jan Lahoda
jlahoda at openjdk.org
Wed Apr 30 13:10:03 UTC 2025
Some Unicode characters consist of two surrogates, i.e. two `char`s. And, such Unicode characters cannot be part of a char literal, as there's no way to represent them as a character literal. But, javac currently accepts code with such characters, and only puts the char, the high surrogate, into the literal, ignoring the second one.
For example, the JDK 24 behavior is:
$ cat /tmp/T.java
public class T {
public static void main(String... args) {
char c = '😊';
System.err.println(Integer.toHexString((int) c));
System.err.println(Character.isHighSurrogate(c));
}
}
$ java /tmp/T.java
d83d
true
But, in JDK 11, such literals have been rejected:
$ java /tmp/T.java
/tmp/T.java:3: error: unclosed character literal
char c = '😊';
^
/tmp/T.java:3: error: illegal character: '\ude0a'
char c = '😊';
^
/tmp/T.java:3: error: unclosed character literal
char c = '😊';
^
3 errors
error: compilation failed
The proposal in this PR is to explicitly check for this case when scanning character literal, and produce explicit error when a multi-surrogate character is used. javac will produce an error like:
$ java /tmp/T.java
/tmp/T.java:3: error: character literal contains more than one UTF-16 code point
char c = '😊';
^
1 error
error: compilation failed
-------------
Commit messages:
- Fixing CheckExamples test.
- 8354908: javac mishandles supplementary character in character literal
Changes: https://git.openjdk.org/jdk/pull/24964/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24964&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8354908
Stats: 62 lines in 4 files changed: 60 ins; 0 del; 2 mod
Patch: https://git.openjdk.org/jdk/pull/24964.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/24964/head:pull/24964
PR: https://git.openjdk.org/jdk/pull/24964
More information about the compiler-dev
mailing list