RFR [10]: 8186517: sun.nio.cs.StandardCharsets$Aliases and ClassMap can be lazily loaded

Martin Buchholz martinrb at google.com
Mon Aug 21 20:57:20 UTC 2017


RFC https://tools.ietf.org/html/rfc3629 and wikipedia
https://en.wikipedia.org/wiki/UTF-8 agree that the name is "UTF-8".

On Mon, Aug 21, 2017 at 1:24 PM, Xueming Shen <xueming.shen at oracle.com>
wrote:

> On 8/21/17, 1:07 PM, Martin Buchholz wrote:
>
> xUEMING, we should assume by default that people will use proper names,
> especially in real code, where sloppiness should be left behind.  The real
> name of the UTF-8 charset is "UTF-8".  Reward careful coders; punish sloppy
> ones.
>
>
> both our spec and iana spec say the charset name is NOT " case-sensitive",
> so strictly speaking it's not
> "sloppy" to use lowercase for the charset name.
>
> but i have to admit it does look sloppy to spell the name in this case
> combination :-)
>
>
>
> On Mon, Aug 21, 2017 at 12:45 PM, Xueming Shen <xueming.shen at oracle.com>
> wrote:
>
>> On 8/21/17, 12:04 PM, Martin Buchholz wrote:
>>
>> OK, but ...
>>
>> I'd like to see further improvements here later, like switching to upper
>> case.
>>
>>
>> what's the benefit of switching to upper case? i would assume the
>> original
>> assumption is that people tends to use lower case charset name in their
>> code, in that case (if the assumption is correct) the "toLower()" then
>> needs to
>> do nothing.
>>
>> the aliases and classes mapping are generated during the build time, so it
>> does not matter it's lowercase or uppercase
>>
>>
>>
>> I just realized we have
>> java/nio/charset/StandardCharsets.java
>> sun/nio/cs/StandardCharsets.java
>>
>> and they both have a UTF_8 field !
>>
>>
>>
>> On Mon, Aug 21, 2017 at 11:53 AM, Claes Redestad <
>> claes.redestad at oracle.com> wrote:
>>
>>>
>>> On 2017-08-21 20:05, Martin Buchholz wrote:
>>>
>>>> I agree we should optimize for common charset names, in part to help
>>>> the world move to UTF-8.
>>>>
>>>
>>> Agreed.
>>>
>>>
>>>> It's *weird* to canonicalize to lower case, when the canonical charset
>>>> names are all uppercase ("UTF-8" instead of "utf-8").
>>>>
>>>
>>> A pre-existing weirdness, and it goes deep enough that I haven't dared
>>> changing it.
>>>
>>>
>>>> ---
>>>>    62     public static final String UTF_8 = "UTF-8";
>>>> Is this still used?
>>>>
>>>> Maybe the very first thing lookup() should do is check
>>>> charsetName == UTF_8
>>>>
>>>
>>> Subsequent lookups are very likely to hit the two-element cache in
>>> Charset, so I've not seen this add up.
>>>
>>>
>>>> ---
>>>>
>>>> Is switching from char[] to StringBuilder really an improvement?
>>>> Charset names are all short, so the cost of copying the char[] to a byte[]
>>>> is negligible.
>>>>
>>>
>>> This allows us to not load and touch the code to deflate a char[] to a
>>> byte[] (StringUTF16), so a tiny, tiny startup win. Throughput-wise it's
>>> likely no different.
>>>
>>> /Claes
>>>
>>>
>>>>
>>>> On Mon, Aug 21, 2017 at 6:46 AM, Claes Redestad <
>>>> claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>>>>
>>>>     Hi,
>>>>
>>>>     the Aliases and Classes inner classes in StandardCharsets can be
>>>>     lazily-loaded by restructuring how we check for the three
>>>>     default-loaded charsets. This removes some classloading and
>>>>     work from happening during critical phases of the VM startup,
>>>>     as well as a net gain on any systems that default to any of the
>>>>     three standard charsets (UTF-8, Latin-1, ASCII).
>>>>
>>>>     Webrev: http://cr.openjdk.java.net/~redestad/8186517/jdk.00/
>>>>     <http://cr.openjdk.java.net/%7Eredestad/8186517/jdk.00/>
>>>>     Bug: https://bugs.openjdk.java.net/browse/JDK-8186517
>>>>     <https://bugs.openjdk.java.net/browse/JDK-8186517>
>>>>
>>>>     I'm not sure if the pre-existing optimization to allow
>>>>     StandardCharsets.charsets() unsynchronized access to internals
>>>>     is really necessary (or even 100% correct), but by ensuring we
>>>>     retrieve the Aliases and Classes instances in a synchronized block
>>>>     we should be no worse off semantically here.
>>>>
>>>>     /Claes
>>>>
>>>>
>>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20170821/f2128a4f/attachment.html>


More information about the nio-dev mailing list