RFR [10]: 8186517: sun.nio.cs.StandardCharsets$Aliases and ClassMap can be lazily loaded

Xueming Shen xueming.shen at oracle.com
Mon Aug 21 20:24:11 UTC 2017


On 8/21/17, 1:07 PM, Martin Buchholz wrote:
> xUEMING, we should assume by default that people will use proper 
> names, especially in real code, where sloppiness should be left 
> behind.  The real name of the UTF-8 charset is "UTF-8".  Reward 
> careful coders; punish sloppy ones.
>

both our spec and iana spec say the charset name is NOT " 
case-sensitive", so strictly speaking it's not
"sloppy" to use lowercase for the charset name.

but i have to admit it does look sloppy to spell the name in this case 
combination :-)


> On Mon, Aug 21, 2017 at 12:45 PM, Xueming Shen 
> <xueming.shen at oracle.com <mailto:xueming.shen at oracle.com>> wrote:
>
>     On 8/21/17, 12:04 PM, Martin Buchholz wrote:
>>     OK, but ...
>>
>>     I'd like to see further improvements here later, like switching
>>     to upper case.
>
>     what's the benefit of switching to upper case? i would assume the
>     original
>     assumption is that people tends to use lower case charset name in
>     their
>     code, in that case (if the assumption is correct) the "toLower()"
>     then needs to
>     do nothing.
>
>     the aliases and classes mapping are generated during the build
>     time, so it
>     does not matter it's lowercase or uppercase
>
>
>>
>>     I just realized we have
>>     java/nio/charset/StandardCharsets.java
>>     sun/nio/cs/StandardCharsets.java
>>
>>     and they both have a UTF_8 field !
>>
>>
>>
>>     On Mon, Aug 21, 2017 at 11:53 AM, Claes Redestad
>>     <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>>
>>
>>         On 2017-08-21 20:05, Martin Buchholz wrote:
>>
>>             I agree we should optimize for common charset names, in
>>             part to help the world move to UTF-8.
>>
>>
>>         Agreed.
>>
>>
>>             It's *weird* to canonicalize to lower case, when the
>>             canonical charset names are all uppercase ("UTF-8"
>>             instead of "utf-8").
>>
>>
>>         A pre-existing weirdness, and it goes deep enough that I
>>         haven't dared changing it.
>>
>>
>>             ---
>>                62     public static final String UTF_8 = "UTF-8";
>>             Is this still used?
>>
>>             Maybe the very first thing lookup() should do is check
>>             charsetName == UTF_8
>>
>>
>>         Subsequent lookups are very likely to hit the two-element
>>         cache in
>>         Charset, so I've not seen this add up.
>>
>>
>>             ---
>>
>>             Is switching from char[] to StringBuilder really an
>>             improvement?  Charset names are all short, so the cost of
>>             copying the char[] to a byte[] is negligible.
>>
>>
>>         This allows us to not load and touch the code to deflate a
>>         char[] to a byte[] (StringUTF16), so a tiny, tiny startup
>>         win. Throughput-wise it's likely no different.
>>
>>         /Claes
>>
>>
>>
>>             On Mon, Aug 21, 2017 at 6:46 AM, Claes Redestad
>>             <claes.redestad at oracle.com
>>             <mailto:claes.redestad at oracle.com>
>>             <mailto:claes.redestad at oracle.com
>>             <mailto:claes.redestad at oracle.com>>> wrote:
>>
>>                 Hi,
>>
>>                 the Aliases and Classes inner classes in
>>             StandardCharsets can be
>>                 lazily-loaded by restructuring how we check for the three
>>                 default-loaded charsets. This removes some
>>             classloading and
>>                 work from happening during critical phases of the VM
>>             startup,
>>                 as well as a net gain on any systems that default to
>>             any of the
>>                 three standard charsets (UTF-8, Latin-1, ASCII).
>>
>>                 Webrev:
>>             http://cr.openjdk.java.net/~redestad/8186517/jdk.00/
>>             <http://cr.openjdk.java.net/%7Eredestad/8186517/jdk.00/>
>>             <http://cr.openjdk.java.net/%7Eredestad/8186517/jdk.00/
>>             <http://cr.openjdk.java.net/%7Eredestad/8186517/jdk.00/>>
>>                 Bug: https://bugs.openjdk.java.net/browse/JDK-8186517
>>             <https://bugs.openjdk.java.net/browse/JDK-8186517>
>>             <https://bugs.openjdk.java.net/browse/JDK-8186517
>>             <https://bugs.openjdk.java.net/browse/JDK-8186517>>
>>
>>                 I'm not sure if the pre-existing optimization to allow
>>                 StandardCharsets.charsets() unsynchronized access to
>>             internals
>>                 is really necessary (or even 100% correct), but by
>>             ensuring we
>>                 retrieve the Aliases and Classes instances in a
>>             synchronized block
>>                 we should be no worse off semantically here.
>>
>>                 /Claes
>>
>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20170821/fabeafac/attachment-0001.html>


More information about the nio-dev mailing list