Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])

Fri Jul 13 17:09:36 UTC 2012

On 07/13/2012 05:19 AM, Alan Bateman wrote:
> On 11/07/2012 00:11, Xueming Shen wrote:
>> Hi,
>>
>> In JDK7,  the decoder and encoder implementation of most of our 
>> single-byte charsets
>> and UTF-8 charset are optimized to implement the internal interfce 
>> sun.nio.cs.ArrayDecoder/
>> Encoder to provide a fastpath for String.getBytes(...) and new 
>> String(byte[]...) operations. I
>> have an old blog regarding this optimization at
>>
>> https://blogs.oracle.com/xuemingshen/entry/faster_new_string_bytes_cs
>>
>> This rfe, as the followup for above changes, is to implement 
>> ArrayDe/Encoder for most
>> of the sun.nio.cs.ext.DoubleByte based double-byte charsets. Here is 
>> the webrev
>>
>> http://cr.openjdk.java.net/~sherman/7183053/webrev
> I've taken a pass over this and it's great to see 
> DoubleByte.Decoder/Encoder implementing 
> sun.nio.cs.ArrayDecoder/Encoder. The results looks good too, a small 
> number of regressions (Big5 at len=32 for example) but this is a micro 
> benchmark and I'm sure there are fluctuations. I don't see anything 
> obviously wrong with the EBCDIC changes I'd need a history book to 
> remember how the shifts between DBCS and SBCS. I think our tests our 
> good for this area so I'm happy. One minor nit is the continue in both 
> encode methods, I think it would be cleaner to use "else if (bb ..." 
> instead.

The continue might make the vm happy, but this is the code I did last 
Oct, so I might be
wrong. Will give a couple run later with "else"

>
> I see in TestStringCoding.java that you've commented out the test that 
> goes over the buffer limit - would I be correct to say that this isn't 
> an issue and this happens with DB charsets today?
>

This is also true for utf-8 I did last year, but utf-8 is excluded at 
the beginning of the test. For
SB, it takes the advantage that the output char[] should always be the 
same as the length
of the input bytes, so this can be checked at the very beginning 
together. For mb, to check
both sp and dp slow down the de/encoding (vm obviously does not like too 
many "if"s). Given
this is an internal interface used exclusively by StringCoding, in which 
it has already
calculated the max buf to feed in, I think this is something that can be 
optimized.

-Sherman

> Ulf - you've got several patches to the double byte charsets and I 
> wonder if you have cycles to try Sherman's patch with jdk8 to see if 
> there is any more to be gained?
>
> -Alan.
>