RFR - JDK-8202442 - String::unescape (Code Review)
Jim Laskey
james.laskey at oracle.com
Thu Sep 20 12:52:08 UTC 2018
Modified as per Stuart's suggestion.
> On Sep 20, 2018, at 7:45 AM, Chris Hegarty <chris.hegarty at oracle.com> wrote:
>
>
>> On 19 Sep 2018, at 23:21, Stuart Marks <stuart.marks at oracle.com> wrote:
>>
>> ...
>>
>> 2979 * Each unicode escape in the form \unnnn is translated to the
>> 2980 * unicode character whose code point is {@code 0xnnnn}. Care should be
>> 2981 * taken when using UTF-16 surrogate pairs to ensure that the high
>> 2982 * surrogate (U+D800..U+DBFF) is immediately followed by a low surrogate
>> 2983 * (U+DC00..U+DFFF) otherwise a
>> 2984 * {@link java.nio.charset.CharacterCodingException} may occur during UTF-8
>> 2985 * decoding.
>>
>>
>> I know you're going to update this based on Naoto's comments, but I'd suggest rethinking this section. The \unnnn construct is called a "Unicode escape" per JLS 3.3, but how it's handled has little to do with Unicode. The nnnn digits are simply translated into a 16-bit 'char' value. Any such value will work, even if it's an invalid UTF-16 code unit (such as 0xFFF0) or an unpaired surrogate.
>
> I had a similar comment/question. CCE is a checked exception, and
> since the method does not declare that it throws CCE, I took a look
> at the implementation and came to the same conclusion as Stuart.
>
> Additionally, why should non-character code points, like \uFFFE, be
> translated? If it’s a non-character code point or a malformed surrogate
> pair, would it not be better to just leave it as-is?
>
> -Chris.
More information about the amber-dev
mailing list