StandardCharset vs. StandardCharsets

Sat May 7 18:54:53 UTC 2011

On 05-07-2011 上午 9:00, Rémi Forax wrote:
> On 05/07/2011 02:17 PM, Ulf Zibis wrote:
>> Hi all,
>>
>> please excuse, that I have still problems to accept this additional 
>> class, but +1 for the plural name.
>>
>> If those charset constants are there, people _will use_ them without 
>> respect on the existing _performance disadvantages_.
>> A common typical use case should be: String.getBytes(...)
>> On small strings there is a performance lost up to 25 % using the 
>> charset variant vs. the charset name variant. See:
>> http://cr.openjdk.java.net/~sherman/7040220/client
>> http://markmail.org/message/2tbas5skgkve52mz
>> http://markmail.org/thread/lnrozcbnpcl5kmzs
>>
>> So I still think, we should have the standard charset names as 
>> constants in class j.n.c.Charset:
>> public static final String UTF_8 = "UTF-8"; etc... 
>
> Using objects instead of string is a better design.
> I see the fact that the String method variants that takes a Charset 
> are slower that the ones that use a String
> as a performance bug, not as a design issue.
>
> The String method that takes a Charset should reuse the class-local 
> decoder
> and the performance problem will go away.
> (The analysis in StringCoding.decode(Charset, ...) (point 1) forget 
> that initializing a decoder has also a cost)

I do know the "slowness" is from initializing cs.newDe/Encoder():-) But 
it is just not "easy" to cache
the de/encoder in this case. There is no guarantee that the cs passed in 
this time is the same one
you had last time, even the name might be the same. Or even the cs this 
time is indeed the same
instance you had last time (you did the cache), there is no guarantee 
the dec/enc returned from
newDecoder()/Encoder() this time will be the same one in your cache, 
until you invoke the
newDecoder()/Encoder() , get the enc/dec and compare to the on in your 
cache, but then why cache
it:-) Something you can do is to do the cache if the cs passed in is 
indeed the one from our own
charset repository (can be trusted that would not do something tricky), 
you can do this by invoking
getClassLoader0() == null, which is expensive, I kinda remember the 
measure showed this might
not be something worthing doing last time when I was there. Sure, those 
charsets in
StandardChsets can be treated specially, if desirable, probably only the 
ascii, iso8859-1 and utf8,
such as

if (cs == StandardCharsets.UTF_8 || cs == StandardCharsets.US_ASCII...) {
...
}

Will do some measurement later to see if to separate the "else" part in 
a side method will speed
up a little, we can do that if the inline does help, but not for 7:-)

Thanks!
-Sherman

> Rémi
> PS: also the else part of if(c instanceof ArrayDecoder) should be in a 
> side method to ease
> the inlining of decode().
>