RFR: 8354908: javac mishandles supplementary character in character literal

Wed Apr 30 13:10:03 UTC 2025

Some Unicode characters consist of two surrogates, i.e. two `char`s. And, such Unicode characters cannot be part of a char literal, as there's no way to represent them as a character literal. But, javac currently accepts code with such characters, and only puts the char, the high surrogate, into the literal, ignoring the second one.

For example, the JDK 24 behavior is:

$ cat /tmp/T.java 
public class T {
    public static void main(String... args) {
       char c = '😊';
       System.err.println(Integer.toHexString((int) c));
       System.err.println(Character.isHighSurrogate(c));
    }
}
$ java /tmp/T.java
d83d
true

But, in JDK 11, such literals have been rejected:

$ java /tmp/T.java
/tmp/T.java:3: error: unclosed character literal
       char c = '😊';
                ^
/tmp/T.java:3: error: illegal character: '\ude0a'
       char c = '😊';
                  ^
/tmp/T.java:3: error: unclosed character literal
       char c = '😊';
                   ^
3 errors
error: compilation failed

The proposal in this PR is to explicitly check for this case when scanning character literal, and produce explicit error when a multi-surrogate character is used. javac will produce an error like:

$ java /tmp/T.java
/tmp/T.java:3: error: character literal contains more than one UTF-16 code point
       char c = '😊';
                ^
1 error
error: compilation failed

-------------

Commit messages:
 - Fixing CheckExamples test.
 - 8354908: javac mishandles supplementary character in character literal

Changes: https://git.openjdk.org/jdk/pull/24964/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24964&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8354908
  Stats: 62 lines in 4 files changed: 60 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/24964.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24964/head:pull/24964

PR: https://git.openjdk.org/jdk/pull/24964