RFR - JDK-8202442 - String::unescape (Code Review)
Stuart Marks
stuart.marks at oracle.com
Wed Sep 19 22:21:18 UTC 2018
On 9/18/18 10:51 AM, Jim Laskey wrote:
> Please review the code for String::unescape. Used to translate escape sequences in a string, typically in a raw string literal, into characters represented by those escapes.
>
> webrev: http://cr.openjdk.java.net/~jlaskey/8202442/webrev/index.html
> jbs: https://bugs.openjdk.java.net/browse/JDK-8202442
> csr: https://bugs.openjdk.java.net/browse/JDK-8202443
Hi Jim,
For citing the JLS, there's a @jls javadoc tag that you might want to use. There
are a couple usages elsewhere in String.java already.
Is there going to be an escape() method that does the inverse of this? I thought
that this was part of your original suite of string enhancements. Will this be
proposed separately, or is it unnecessary?
2979 * Each unicode escape in the form \unnnn is translated to the
2980 * unicode character whose code point is {@code 0xnnnn}. Care should be
2981 * taken when using UTF-16 surrogate pairs to ensure that the high
2982 * surrogate (U+D800..U+DBFF) is immediately followed by a low surrogate
2983 * (U+DC00..U+DFFF) otherwise a
2984 * {@link java.nio.charset.CharacterCodingException} may occur during UTF-8
2985 * decoding.
I know you're going to update this based on Naoto's comments, but I'd suggest
rethinking this section. The \unnnn construct is called a "Unicode escape" per
JLS 3.3, but how it's handled has little to do with Unicode. The nnnn digits are
simply translated into a 16-bit 'char' value. Any such value will work, even if
it's an invalid UTF-16 code unit (such as 0xFFF0) or an unpaired surrogate.
I believe this is consistent with the JLS treatment of \unnnn.
It might be sufficient to say that \unnnn is translated into a 16-bit 'char'
value, and leave it at that.
s'marks
More information about the amber-dev
mailing list