[jdk17] Integrated: JDK-8269150 UnicodeReader not translating \u005c\\u005d to \\]
Jim Laskey
jlaskey at openjdk.java.net
Mon Jul 26 18:07:46 UTC 2021
On Wed, 23 Jun 2021 15:30:25 GMT, Jim Laskey <jlaskey at openjdk.org> wrote:
> This issue relates to *Unicode escapes*, described in section 3.3 of the JLS. javac interprets Unicode escapes during the reading of ASCII characters from source. Later on, javac interprets *escape sequences*, described in section 3.7 of the JLS, during the tokenization of character literals, string literals, and text blocks. Escape sequences are only indirectly affected by this bug.
>
> During reading, a _normal backslash_ (that is, the ASCII `` character, not the corresponding Unicode escape `\u005c`) followed by another normal backslash is treated collectively as a pair of backslash characters. No further interpretation is done. This means that if a normal backslash immediately precedes the sequence `` `u` `A` `B` `C` `D` which would "normally" be interpreted as an Unicode escape, then the interpretation of that sequence as a Unicode escape is suppressed.
>
> For example, the sequence `\u2022` would be interpreted as the `•` character, whereas `\\u2022` would be interpreted as the seven characters `` `` `u` `2` `0` `2` `2`.
>
> An issue arises when Java developers choose to use a _Unicode escape backslash_ `\u005c` in their source code, instead of a normal backslash. Prior to JDK 16, if the Unicode escape backslash was followed by a second Unicode escape, then *the second Unicode escape was always interpreted*. The normal backslash at the beginning of the second Unicode escape (immediately followed by `u`) was *not* paired with the preceding Unicode escape backslash. Elsewise, any following normal backslash will be paired with the `\u005c`.
>
> For example, the sequence `\u005c\u2022` would be interpreted as `` and `•`, whereas `\u005c\tXYZ` would be interpreted as `` `` `t` `X` `Y` `Z`.
>
> The bug in JDK 16 ignored `\u005c` as having any effect on Unicode interpretation. Using the example from compiler-dev discussions, `\u005c\\u005d` :
>
> - Prior to JDK 16, it was interpreted as `` `` `]`
> - JDK 16 interpreted it as `` `` `` `u` `0` `0` `5` `d` which would produce a syntax error downstream in the lexer because the escape sequence `\u` is invalid.
This pull request has now been integrated.
Changeset: b76a8388
Author: Jim Laskey <jlaskey at openjdk.org>
URL: https://git.openjdk.java.net/jdk17/commit/b76a83888b00faff602726f5409e1c902b91e908
Stats: 96 lines in 2 files changed: 88 ins; 4 del; 4 mod
8269150: UnicodeReader not translating \u005c\\u005d to \\]
Reviewed-by: jjg, jlahoda, darcy
-------------
PR: https://git.openjdk.java.net/jdk17/pull/126
More information about the compiler-dev
mailing list