Codereview request for 7096080: UTF8 update and new CESU-8 charset

Fri Oct 14 08:47:05 UTC 2011

Am 30.09.2011 22:46, schrieb Xueming Shen:
> I believe we changed from (b1 < xyz) to (b1 >> x) == -2 back to 2009(?) because
> the benchmark shows the "shift" version is slightly faster. Do you have any number
> shows any difference now. My non-scientific benchmark still suggests the "shift"
> type is faster on -server vm, no significant difference on -client vm.
My new guess for the reason:
The unfolding of the bytes to int to serve the isNotContinuation / isMalformedxx methods.
So those methods should be coded in byte logic too.

But there remains the big question, why c1 is faster than c2, except for 1b.

-Ulf

>
>   ------------------  your new switch---------------
> (1) -server
> Method                      Millis  Ratio
> Decoding 1b UTF-8 :            125  1.000
> Decoding 2b UTF-8 :           2558 20.443
> Decoding 3b UTF-8 :           3439 27.481
> Decoding 4b UTF-8 :           2030 16.221
> (2) -client
> Decoding 1b UTF-8 :            335  1.000
> Decoding 2b UTF-8 :           1041  3.105
> Decoding 3b UTF-8 :           2245  6.694
> Decoding 4b UTF-8 :           1254  3.741
>
>   ------------------ existing "shift"---------------
> (1) -server
> Decoding 1b UTF-8 :            134  1.000
> Decoding 2b UTF-8 :           1891 14.106
> Decoding 3b UTF-8 :           2934 21.886
> Decoding 4b UTF-8 :           2133 15.913
> (2) -client
> Decoding 1b UTF-8 :            341  1.000
> Decoding 2b UTF-8 :            949  2.560
> Decoding 3b UTF-8 :           2321  6.255
> Decoding 4b UTF-8 :           1278  3.446
>
>
>
> -sherman
>