RFR [10]: 8186517: sun.nio.cs.StandardCharsets$Aliases and ClassMap can be lazily loaded
Xueming Shen
xueming.shen at oracle.com
Mon Aug 21 20:24:11 UTC 2017
On 8/21/17, 1:07 PM, Martin Buchholz wrote:
> xUEMING, we should assume by default that people will use proper
> names, especially in real code, where sloppiness should be left
> behind. The real name of the UTF-8 charset is "UTF-8". Reward
> careful coders; punish sloppy ones.
>
both our spec and iana spec say the charset name is NOT "
case-sensitive", so strictly speaking it's not
"sloppy" to use lowercase for the charset name.
but i have to admit it does look sloppy to spell the name in this case
combination :-)
> On Mon, Aug 21, 2017 at 12:45 PM, Xueming Shen
> <xueming.shen at oracle.com <mailto:xueming.shen at oracle.com>> wrote:
>
> On 8/21/17, 12:04 PM, Martin Buchholz wrote:
>> OK, but ...
>>
>> I'd like to see further improvements here later, like switching
>> to upper case.
>
> what's the benefit of switching to upper case? i would assume the
> original
> assumption is that people tends to use lower case charset name in
> their
> code, in that case (if the assumption is correct) the "toLower()"
> then needs to
> do nothing.
>
> the aliases and classes mapping are generated during the build
> time, so it
> does not matter it's lowercase or uppercase
>
>
>>
>> I just realized we have
>> java/nio/charset/StandardCharsets.java
>> sun/nio/cs/StandardCharsets.java
>>
>> and they both have a UTF_8 field !
>>
>>
>>
>> On Mon, Aug 21, 2017 at 11:53 AM, Claes Redestad
>> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>>
>>
>> On 2017-08-21 20:05, Martin Buchholz wrote:
>>
>> I agree we should optimize for common charset names, in
>> part to help the world move to UTF-8.
>>
>>
>> Agreed.
>>
>>
>> It's *weird* to canonicalize to lower case, when the
>> canonical charset names are all uppercase ("UTF-8"
>> instead of "utf-8").
>>
>>
>> A pre-existing weirdness, and it goes deep enough that I
>> haven't dared changing it.
>>
>>
>> ---
>> 62 public static final String UTF_8 = "UTF-8";
>> Is this still used?
>>
>> Maybe the very first thing lookup() should do is check
>> charsetName == UTF_8
>>
>>
>> Subsequent lookups are very likely to hit the two-element
>> cache in
>> Charset, so I've not seen this add up.
>>
>>
>> ---
>>
>> Is switching from char[] to StringBuilder really an
>> improvement? Charset names are all short, so the cost of
>> copying the char[] to a byte[] is negligible.
>>
>>
>> This allows us to not load and touch the code to deflate a
>> char[] to a byte[] (StringUTF16), so a tiny, tiny startup
>> win. Throughput-wise it's likely no different.
>>
>> /Claes
>>
>>
>>
>> On Mon, Aug 21, 2017 at 6:46 AM, Claes Redestad
>> <claes.redestad at oracle.com
>> <mailto:claes.redestad at oracle.com>
>> <mailto:claes.redestad at oracle.com
>> <mailto:claes.redestad at oracle.com>>> wrote:
>>
>> Hi,
>>
>> the Aliases and Classes inner classes in
>> StandardCharsets can be
>> lazily-loaded by restructuring how we check for the three
>> default-loaded charsets. This removes some
>> classloading and
>> work from happening during critical phases of the VM
>> startup,
>> as well as a net gain on any systems that default to
>> any of the
>> three standard charsets (UTF-8, Latin-1, ASCII).
>>
>> Webrev:
>> http://cr.openjdk.java.net/~redestad/8186517/jdk.00/
>> <http://cr.openjdk.java.net/%7Eredestad/8186517/jdk.00/>
>> <http://cr.openjdk.java.net/%7Eredestad/8186517/jdk.00/
>> <http://cr.openjdk.java.net/%7Eredestad/8186517/jdk.00/>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8186517
>> <https://bugs.openjdk.java.net/browse/JDK-8186517>
>> <https://bugs.openjdk.java.net/browse/JDK-8186517
>> <https://bugs.openjdk.java.net/browse/JDK-8186517>>
>>
>> I'm not sure if the pre-existing optimization to allow
>> StandardCharsets.charsets() unsynchronized access to
>> internals
>> is really necessary (or even 100% correct), but by
>> ensuring we
>> retrieve the Aliases and Classes instances in a
>> synchronized block
>> we should be no worse off semantically here.
>>
>> /Claes
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20170821/fabeafac/attachment-0001.html>
More information about the nio-dev
mailing list