<i18n dev> Codereview request for 7096080: UTF8 update and new CESU-8 charset
Ulf Zibis
Ulf.Zibis at gmx.de
Sun Oct 2 14:36:36 PDT 2011
Hi again,
Am 30.09.2011 00:27, schrieb Xueming Shen:
> On 09/29/2011 02:16 PM, Ulf Zibis wrote:
>>
>> 280 if (Character.isSurrogate(c))
>> 281 return malformedForLength(src, sp, dst, dp, 3);
>> Shouldn't we return cr.length() = 1to allow remaining 2 bytes to be interpreted again ?
>>
Forget it! If c is a surrogate, b2 is in range A0..BF and b3 is in range 80..BF. Both can not be
potentially well-formed as a first byte.
> Actually I don't know the answer. My reading of D93a/D93b suggests that we might
> interpret it as a whole, given the bytes are actually in well-formed byte pattern range
> listed in Table 3.7, but "ill-formed" simply because they are surrogate value not scale
> value, so I would interpret the whole 3 bytes as a maximal subpart. Given D93a/b is
> "best practices for Using U+fffd", either way should be fine. We do have Unicode expert
> on the list, so maybe they can share their opinion on what is the "desired"/recommended
> behavior in this case, from Standard point view?
At line 102 you could insert:
// [E0] [A0..BF]
// [E1..EF] [80..BF]
-Ulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/i18n-dev/attachments/20111002/d49cc7ef/attachment.html
More information about the i18n-dev
mailing list