Integrated: 8354908: javac mishandles supplementary character in character literal
Jan Lahoda
jlahoda at openjdk.org
Tue May 13 06:18:58 UTC 2025
On Wed, 30 Apr 2025 13:05:04 GMT, Jan Lahoda <jlahoda at openjdk.org> wrote:
> Some Unicode characters consist of two surrogates, i.e. two `char`s. And, such Unicode characters cannot be part of a char literal, as there's no way to represent them as a character literal. But, javac currently accepts code with such characters, and only puts the char, the high surrogate, into the literal, ignoring the second one.
>
> For example, the JDK 24 behavior is:
>
> $ cat /tmp/T.java
> public class T {
> public static void main(String... args) {
> char c = '😊';
> System.err.println(Integer.toHexString((int) c));
> System.err.println(Character.isHighSurrogate(c));
> }
> }
> $ java /tmp/T.java
> d83d
> true
>
>
> But, in JDK 11, such literals have been rejected:
>
> $ java /tmp/T.java
> /tmp/T.java:3: error: unclosed character literal
> char c = '😊';
> ^
> /tmp/T.java:3: error: illegal character: '\ude0a'
> char c = '😊';
> ^
> /tmp/T.java:3: error: unclosed character literal
> char c = '😊';
> ^
> 3 errors
> error: compilation failed
>
>
> The proposal in this PR is to explicitly check for this case when scanning character literal, and produce explicit error when a multi-surrogate character is used. javac will produce an error like:
>
> $ java /tmp/T.java
> /tmp/T.java:3: error: character literal contains more than one UTF-16 code point
> char c = '😊';
> ^
> 1 error
> error: compilation failed
This pull request has now been integrated.
Changeset: 03dca032
Author: Jan Lahoda <jlahoda at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/03dca0323d79ef5fb1c8ee1152667e2188fa5e01
Stats: 62 lines in 4 files changed: 60 ins; 0 del; 2 mod
8354908: javac mishandles supplementary character in character literal
Reviewed-by: naoto, vromero
-------------
PR: https://git.openjdk.org/jdk/pull/24964
More information about the compiler-dev
mailing list