<i18n dev> Codereview request for 7096080: UTF8 update and new CESU-8 charset
Xueming Shen
xueming.shen at oracle.com
Thu Sep 29 15:27:46 PDT 2011
On 09/29/2011 02:16 PM, Ulf Zibis wrote:
> Please use spaces with ternary operators: Lines 155, 216
>
> For short you could use sr instead srcRemaining, consistent to sa, sp, sl.
>
> 420 // returns -1 if there is malformed byte(s) and the
> better:
> 420 // returns -1 if there is/are malformed byte(s) and the
>
> 466 sp -=3;
> There should be a space: sp -= 3;
Webrev has been updated accordingly.
>
> 280 if (Character.isSurrogate(c))
> 281 return malformedForLength(src, sp, dst,
> dp, 3);
> Shouldn't we return cr.length() = 1to allow remaining 2 bytes to be
> interpreted again ?
>
Actually I don't know the answer. My reading of D93a/D93b suggests that
we might
interpret it as a whole, given the bytes are actually in well-formed
byte pattern range
listed in Table 3.7, but "ill-formed" simply because they are surrogate
value not scale
value, so I would interpret the whole 3 bytes as a maximal subpart.
Given D93a/b is
"best practices for Using U+fffd", either way should be fine. We do have
Unicode expert
on the list, so maybe they can share their opinion on what is the
"desired"/recommended
behavior in this case, from Standard point view?
>
> Am 29.09.2011 05:27, schrieb Xueming Shen:
>> Hi,
>>
>> On 9/28/2011 3:44 PM, Ulf Zibis wrote:
>>> 5. IMHO charset CESU-8 should be hosted in extended-charsets,
>>> otherwise it should be added to java.nio.StandardCharsets
>>>
>>
>> We have lots of charsets provided via the "standard charset provider"
>> (in rt.jar) but not listed in StandardCharsets.
> Yes, but the reasonable to add CESU-8 to StandardCharsets was the
> supposed demand to treat all unicode charsets equivalent.
>
> Otherwise there is no obstacle to host CESU-8 in extended-charsets.
> IMHO, CESU-8 addresses corner case compatibility issues, but not
> "standard" requirements.
To put CESU-8 into "standard charset provider" (it is only an
implementation details) does
not mean it is a "standard" requirement, it just means it is bundled
into rt.jar. The reason
I put it there is to make sure it is together with the UTF-8, with the
assumption is that you
might need it around when using the updated UTF-8, which no longer
handles those 3/6-byte
surrogates.
-Sherman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/i18n-dev/attachments/20110929/1d16f48f/attachment.html
More information about the i18n-dev
mailing list