JDK-8254073, unicode escape preprocessing, and \u005C

Jim Laskey james.laskey at oracle.com
Tue Jun 22 17:48:54 UTC 2021


diff --git a/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/UnicodeReader.java b/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/UnicodeReader.java
index c51be0fdf07..9603fa0da7b 100644
--- a/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/UnicodeReader.java
+++ b/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/UnicodeReader.java
@@ -169,7 +169,7 @@ public class UnicodeReader {
             // May be an unicode escape.
             switch (unicodeEscape()) {
                 case BACKSLASH -> wasBackslash = true;
-                case VALID_ESCAPE -> wasBackslash = false;
+                case VALID_ESCAPE -> wasBackslash = character == '\\';
                 case BROKEN_ESCAPE -> nextUnicodeInputCharacter(); //skip broken unicode escapes
             }
         }


Running tests now.

On Jun 21, 2021, at 6:28 PM, Liam Miller-Cushon <cushon at google.com<mailto:cushon at google.com>> wrote:

class T {
  public static void main(String[] args) {
    System.err.println("\u005C\\u005D");
  }
}

Before JDK-8254073, this prints `\]`.

After JDK-8254073, unicode escape processing results in `\\\u005D`, which results in an 'invalid escape' error for `\u`. Was that deliberate?

JLS 3.3 says

> for each raw input character that is a backslash \, input processing must consider how many other \ characters contiguously precede it, separating it from a non-\ character or the start of the input stream. If this number is even, then the \ is eligible to begin a Unicode escape; if the number is odd, then the \ is not eligible to begin a Unicode escape.

The difference is in whether `\u005C` (the unicode escape for `\`) counts as one of the `\` preceding a valid unicode escape.

Does "how many other \ characters contiguously precede it" refer to preceding raw input characters, or does it refer to preceding characters after unicode escape processing is performed on them?

JLS 3.3 also mentions that a "character produced by a Unicode escape does not participate in further Unicode escapes", but I'm not sure if that applies here, since in the pre-JDK-8254073 interpretation the unicode-escaped backslash isn't really 'participating' in the second unicode escape.

Thanks,
Liam

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20210622/ab65600b/attachment.htm>


More information about the compiler-dev mailing list