RFR [10]: 8186517: sun.nio.cs.StandardCharsets$Aliases and ClassMap can be lazily loaded

Mon Aug 21 20:07:18 UTC 2017

xUEMING, we should assume by default that people will use proper names,
especially in real code, where sloppiness should be left behind.  The real
name of the UTF-8 charset is "UTF-8".  Reward careful coders; punish sloppy
ones.

On Mon, Aug 21, 2017 at 12:45 PM, Xueming Shen <xueming.shen at oracle.com>
wrote:

> On 8/21/17, 12:04 PM, Martin Buchholz wrote:
>
> OK, but ...
>
> I'd like to see further improvements here later, like switching to upper
> case.
>
>
> what's the benefit of switching to upper case? i would assume the original
> assumption is that people tends to use lower case charset name in their
> code, in that case (if the assumption is correct) the "toLower()" then
> needs to
> do nothing.
>
> the aliases and classes mapping are generated during the build time, so it
> does not matter it's lowercase or uppercase
>
>
>
> I just realized we have
> java/nio/charset/StandardCharsets.java
> sun/nio/cs/StandardCharsets.java
>
> and they both have a UTF_8 field !
>
>
>
> On Mon, Aug 21, 2017 at 11:53 AM, Claes Redestad <
> claes.redestad at oracle.com> wrote:
>
>>
>> On 2017-08-21 20:05, Martin Buchholz wrote:
>>
>>> I agree we should optimize for common charset names, in part to help the
>>> world move to UTF-8.
>>>
>>
>> Agreed.
>>
>>
>>> It's *weird* to canonicalize to lower case, when the canonical charset
>>> names are all uppercase ("UTF-8" instead of "utf-8").
>>>
>>
>> A pre-existing weirdness, and it goes deep enough that I haven't dared
>> changing it.
>>
>>
>>> ---
>>>    62     public static final String UTF_8 = "UTF-8";
>>> Is this still used?
>>>
>>> Maybe the very first thing lookup() should do is check
>>> charsetName == UTF_8
>>>
>>
>> Subsequent lookups are very likely to hit the two-element cache in
>> Charset, so I've not seen this add up.
>>
>>
>>> ---
>>>
>>> Is switching from char[] to StringBuilder really an improvement?
>>> Charset names are all short, so the cost of copying the char[] to a byte[]
>>> is negligible.
>>>
>>
>> This allows us to not load and touch the code to deflate a char[] to a
>> byte[] (StringUTF16), so a tiny, tiny startup win. Throughput-wise it's
>> likely no different.
>>
>> /Claes
>>
>>
>>>
>>> On Mon, Aug 21, 2017 at 6:46 AM, Claes Redestad <
>>> claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>>>
>>>     Hi,
>>>
>>>     the Aliases and Classes inner classes in StandardCharsets can be
>>>     lazily-loaded by restructuring how we check for the three
>>>     default-loaded charsets. This removes some classloading and
>>>     work from happening during critical phases of the VM startup,
>>>     as well as a net gain on any systems that default to any of the
>>>     three standard charsets (UTF-8, Latin-1, ASCII).
>>>
>>>     Webrev: http://cr.openjdk.java.net/~redestad/8186517/jdk.00/
>>>     <http://cr.openjdk.java.net/%7Eredestad/8186517/jdk.00/>
>>>     Bug: https://bugs.openjdk.java.net/browse/JDK-8186517
>>>     <https://bugs.openjdk.java.net/browse/JDK-8186517>
>>>
>>>     I'm not sure if the pre-existing optimization to allow
>>>     StandardCharsets.charsets() unsynchronized access to internals
>>>     is really necessary (or even 100% correct), but by ensuring we
>>>     retrieve the Aliases and Classes instances in a synchronized block
>>>     we should be no worse off semantically here.
>>>
>>>     /Claes
>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20170821/028d0953/attachment.html>