RFR - JDK-8202442 - String::unescape (Code Review)

Chris Hegarty chris.hegarty at oracle.com
Thu Sep 20 10:45:57 UTC 2018


> On 19 Sep 2018, at 23:21, Stuart Marks <stuart.marks at oracle.com> wrote:
> 
> ...
> 
> 2979      * Each unicode escape in the form \unnnn is translated to the
> 2980      * unicode character whose code point is {@code 0xnnnn}. Care should be
> 2981      * taken when using UTF-16 surrogate pairs to ensure that the high
> 2982      * surrogate (U+D800..U+DBFF) is immediately followed by a low surrogate
> 2983      * (U+DC00..U+DFFF) otherwise a
> 2984      * {@link java.nio.charset.CharacterCodingException} may occur during UTF-8
> 2985      * decoding.
> 
> 
> I know you're going to update this based on Naoto's comments, but I'd suggest rethinking this section. The \unnnn construct is called a "Unicode escape" per JLS 3.3, but how it's handled has little to do with Unicode. The nnnn digits are simply translated into a 16-bit 'char' value. Any such value will work, even if it's an invalid UTF-16 code unit (such as 0xFFF0) or an unpaired surrogate.

I had a similar comment/question. CCE is a checked exception, and
since the method does not declare that it throws CCE, I took a look
at the implementation and came to the same conclusion as Stuart.

Additionally, why should non-character code points, like \uFFFE, be
translated? If it’s a non-character code point or a malformed surrogate
pair, would it not be better to just leave it as-is?

-Chris.


More information about the core-libs-dev mailing list