RFR - JDK-8202442 - String::unescape (Code Review)
Chris Hegarty
chris.hegarty at oracle.com
Thu Sep 20 10:45:57 UTC 2018
> On 19 Sep 2018, at 23:21, Stuart Marks <stuart.marks at oracle.com> wrote:
>
> ...
>
> 2979 * Each unicode escape in the form \unnnn is translated to the
> 2980 * unicode character whose code point is {@code 0xnnnn}. Care should be
> 2981 * taken when using UTF-16 surrogate pairs to ensure that the high
> 2982 * surrogate (U+D800..U+DBFF) is immediately followed by a low surrogate
> 2983 * (U+DC00..U+DFFF) otherwise a
> 2984 * {@link java.nio.charset.CharacterCodingException} may occur during UTF-8
> 2985 * decoding.
>
>
> I know you're going to update this based on Naoto's comments, but I'd suggest rethinking this section. The \unnnn construct is called a "Unicode escape" per JLS 3.3, but how it's handled has little to do with Unicode. The nnnn digits are simply translated into a 16-bit 'char' value. Any such value will work, even if it's an invalid UTF-16 code unit (such as 0xFFF0) or an unpaired surrogate.
I had a similar comment/question. CCE is a checked exception, and
since the method does not declare that it throws CCE, I took a look
at the implementation and came to the same conclusion as Stuart.
Additionally, why should non-character code points, like \uFFFE, be
translated? If it’s a non-character code point or a malformed surrogate
pair, would it not be better to just leave it as-is?
-Chris.
More information about the amber-dev
mailing list