StandardCharset vs. StandardCharsets
Rémi Forax
forax at univ-mlv.fr
Sun May 8 13:09:23 UTC 2011
On 05/07/2011 08:54 PM, Xueming Shen wrote:
> On 05-07-2011 上午 9:00, Rémi Forax wrote:
>> On 05/07/2011 02:17 PM, Ulf Zibis wrote:
>>> Hi all,
>>>
>>> please excuse, that I have still problems to accept this additional
>>> class, but +1 for the plural name.
>>>
>>> If those charset constants are there, people _will use_ them without
>>> respect on the existing _performance disadvantages_.
>>> A common typical use case should be: String.getBytes(...)
>>> On small strings there is a performance lost up to 25 % using the
>>> charset variant vs. the charset name variant. See:
>>> http://cr.openjdk.java.net/~sherman/7040220/client
>>> http://markmail.org/message/2tbas5skgkve52mz
>>> http://markmail.org/thread/lnrozcbnpcl5kmzs
>>>
>>> So I still think, we should have the standard charset names as
>>> constants in class j.n.c.Charset:
>>> public static final String UTF_8 = "UTF-8"; etc...
>>
>> Using objects instead of string is a better design.
>> I see the fact that the String method variants that takes a Charset
>> are slower that the ones that use a String
>> as a performance bug, not as a design issue.
>>
>> The String method that takes a Charset should reuse the class-local
>> decoder
>> and the performance problem will go away.
>> (The analysis in StringCoding.decode(Charset, ...) (point 1) forget
>> that initializing a decoder has also a cost)
>
> I do know the "slowness" is from initializing cs.newDe/Encoder():-)
> But it is just not "easy" to cache
> the de/encoder in this case. There is no guarantee that the cs passed
> in this time is the same one
> you had last time, even the name might be the same.
I agree. You can load a new charset this will override an existing one.
> Or even the cs this time is indeed the same
> instance you had last time (you did the cache), there is no guarantee
> the dec/enc returned from
> newDecoder()/Encoder() this time will be the same one in your cache,
> until you invoke the
> newDecoder()/Encoder() ,
Here, you don't care if it's the same encoder/decoder.
Any decoder/encoder will be valid, if the charset is the same (==) as
the thread-local one.
The spec allows to reuse encoder/decoder, so it's valid to reuse one
created from the same
charset. As I said, the javadoc offers no guarantee that we should
always call newEncoder/newDecoder
and even says that if you want control over the encoder/decoder you
should call newEncoder/newDecoder
yourself.
> get the enc/dec and compare to the on in your cache, but then why cache
> it:-) Something you can do is to do the cache if the cs passed in is
> indeed the one from our own
> charset repository (can be trusted that would not do something
> tricky), you can do this by invoking
> getClassLoader0() == null, which is expensive, I kinda remember the
> measure showed this might
> not be something worthing doing last time when I was there.
Yes, weirdly, getClassLoader() is not a VM intrinsic even if by example
getSuperClass() which is
far more complex than getClassLoader() is an intrinsic.
It should be a good idea to open a bug to create an intrinsic for
getClassLoader,
checking if a classloader of a class is null or not is a common pattern.
> Sure, those charsets in
> StandardChsets can be treated specially, if desirable, probably only
> the ascii, iso8859-1 and utf8,
> such as
>
> if (cs == StandardCharsets.UTF_8 || cs == StandardCharsets.US_ASCII...) {
> ...
> }
I think it's better to reuse the caching mechanism used if the charset
is a String.
>
> Will do some measurement later to see if to separate the "else" part
> in a side method will speed
> up a little, we can do that if the inline does help, but not for 7:-)
Yes, not for 7, but 8 is around the corner.
>
> Thanks!
> -Sherman
cheers,
Rémi
More information about the core-libs-dev
mailing list