Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

Thu Apr 28 21:28:43 UTC 2011

On 04/28/2011 01:55 PM, Ulf Zibis wrote:
> Am 28.04.2011 21:56, schrieb Xueming Shen:
>> That said, you do have the point, we should do better even in
>> malformed case, ...
> Yes, that's what I wanted to point on.
> But I thought, you could go 1 step further, declaring bb as member of 
> UTF_8.Decoder. Then it should be guaranteed, the a decoder is in use 
> of only one thread at same time. Don't know if that is the case for 
> the typical use cases?

Why do you want to "re-use" a ByteBuffer object cross decode(byte[]...) 
invocations?
I don't see any benefit of doing that.

> In http://cr.openjdk.java.net/~mduigou/4884238/2/webrev/ I've seen the 
> change to use a constant Charset object instead of a constant charset 
> name on some method calls. From your benchmark it seems, using 
> constant charset names has some little performance gain (0..25 %) , so 
> I don't see the benefit of the changes from 4884238 in contrary 
> direction.
>

That is a totally different topic:-)

Yes, you don't benefit from using a "Charset object"  when do 
String.getBytes()/toCharArray()
because of our caching optimization in StringCoding class. But that is a 
pure implementation
detail. It's safe to say that java.nio.cs.StandardCharset is not for 
String.getBytes()/toCharArray()
only, so the fact that "cs" variant of String.getBytes()/toCharArray() 
is "slower" than its "csn"
variant arguably might not be a very strong/supportive material for that 
discussion:-)

-Sherman

> -Ulf
>