<i18n dev> Codereview request for 7096080: UTF8 update and new CESU-8 charset
Xueming Shen
xueming.shen at oracle.com
Fri Sep 30 13:46:42 PDT 2011
On 09/30/2011 07:09 AM, Ulf Zibis wrote:
>>>
>>
>> (1) new byte[]{(byte)0xE1, (byte)0x80, (byte)0x42} --->
>> CoderResult.malformedForLength(1)
>> It appears the Unicode Standard now explicitly recommends to return
>> the malformed length 2,
>> what UTF-8 is doing now, for this scenario
> My idea behind is, that in case of malformed length 1 a consecutive
> call to the decode loop would again return another malformed length 1,
> to ensure 2 replacement chars in the output string. (Not sure, if that
> is expected in this corner case.)
Unicode Standard's "best practices" D93a/b recommends to return 2 in
this case.
> 3. Consider additionally 6795537 - UTF_8$Decoder returns wrong results
> <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6795537>
>
>
>> I'm not sure I understand the suggested b1 < -0x3e patch, I don't
>> see we can simply replace
>> ((b1 >> 5) == -2) with (b1 < -0x3e).
> You must see the b1 < -0x3e in combination with the following b1 <
> -0x20. ;-)
>
> But now I have a better "if...else if" switch. :-)
> - saves the shift operations
> - only 1 comparison per case
> - only 1 constant to load per case
> - helps compiler to benefit from 1 byte constants and op-codes
> - much better readable
I believe we changed from (b1 < xyz) to (b1 >> x) == -2 back to 2009(?)
because
the benchmark shows the "shift" version is slightly faster. Do you have
any number
shows any difference now. My non-scientific benchmark still suggests the
"shift"
type is faster on -server vm, no significant difference on -client vm.
------------------ your new switch---------------
(1) -server
Method Millis Ratio
Decoding 1b UTF-8 : 125 1.000
Decoding 2b UTF-8 : 2558 20.443
Decoding 3b UTF-8 : 3439 27.481
Decoding 4b UTF-8 : 2030 16.221
(2) -client
Decoding 1b UTF-8 : 335 1.000
Decoding 2b UTF-8 : 1041 3.105
Decoding 3b UTF-8 : 2245 6.694
Decoding 4b UTF-8 : 1254 3.741
------------------ existing "shift"---------------
(1) -server
Decoding 1b UTF-8 : 134 1.000
Decoding 2b UTF-8 : 1891 14.106
Decoding 3b UTF-8 : 2934 21.886
Decoding 4b UTF-8 : 2133 15.913
(2) -client
Decoding 1b UTF-8 : 341 1.000
Decoding 2b UTF-8 : 949 2.560
Decoding 3b UTF-8 : 2321 6.255
Decoding 4b UTF-8 : 1278 3.446
-sherman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/i18n-dev/attachments/20110930/13d31517/attachment.html
More information about the i18n-dev
mailing list