Review (Updated) : 4884238 : Constants for Standard Charsets

Tue Apr 19 11:52:02 UTC 2011

Am 19.04.2011 00:24, schrieb Mike Duigou:
> Hello All;
>
> I have updated the webrev for the standard Charset constants based upon feedback received from the earlier reviews. The constants class is now named StandardCharset rather than Charsets in mimicry of the class naming used by NIO.2 (JSR 203). According to the conventions used by JSR 203 and elsewhere in JDK a "Charsets" class would be for static utility methods rather than solely constants.
>
> The DEFAULT_CHARSET definition is also now removed. The Charset.getDefaultCharset() method makes it more clear that the value is not a constant across all all JVM instances.
>
> Also now included is a small unit test and replacement of several uses of Charset.forname() for standard charset names within the platform.
>
> The latest webrev : http://cr.openjdk.java.net/~mduigou/4884238/2/webrev/
>
> Any other remaining feeback?
>

Reading 'StandardCharset' one expects _the_ standard charset, but we have a collection of 6 here, so 
I'm for 'StandardCharsets' similar to e.g. j.u.z.ZipConstants, j.u.z.ZipConstants64, ...

I'm still against this additional class and vote for adding these constants to existing Charset. We 
also don't have things like StandardDouble(s).NaN.
Your rationale is to avoid 6 charset classes to be loaded (which I still think is nothing against 
the initialization of the existing Charset class) on very early state when instantiation of IN, OUT, 
ERR forces Charset class to load for the lookup of the default charset.

I think, we should *catch the problem at the source*. The source is, that the given sun.nio.cs 
charset providers supply one distinct class for each charset, which is completely unnecessary. We 
only need distinct classes for each CharsetDecoder and CharsetEncoder to have them fast.
In my approach from ***Bug 100098* <https://bugs.openjdk.java.net/show_bug.cgi?id=100098> - Make 
sun.nio.cs.* charset objects light-weight**such a class is named 'FastCharset'.
As you can see there, in my FastCharsetProviderclass, the lookup of a charset doesn't anymore need a 
Class.forName(...) call, which should be much faster then before.

In the meantime we could have something (which demonstrates the advantage to have the canonical 
names as constant) like:

package sun.nio.cs;
public class FastCharsetProvider extends CharsetProvider {

     public final Charset charsetForName(String charsetName) {
         switch (charsetName) {
             case US_ASCII : return new StandardCharset(US_ASCII);
             case ISO_8859_1 : return new StandardCharset(ISO_8859_1);
             case UTF_8 : return new StandardCharset(UTF_8);
             case UTF_16 : return new StandardCharset(UTF_16);
             case UTF_16BE : return new StandardCharset(UTF_16BE);
             case UTF_16LE : return new StandardCharset(UTF_16LE);
             default : synchronized (this) { return lookup(canonicalize(charsetName)); }
     }

     final class StandardCharset extends Charset {
         final String name;

         newDecoder() {
             switch (charsetName) {
                 case US_ASCII : return new US_ASCII.Decoder();
                 case ISO_8859_1 : return new ISO_8859_1.Decoder();
                 case UTF_8 : return new UTF_8.Decoder();
                 case UTF_16 : return new UTF_16.Decoder();
                 case UTF_16BE : return new UTF_16BE.Decoder();
                 case UTF_16LE : return new UTF_16LE.Decoder();
                 default : return Charset.defaultCharset().newDecoder();
         }
         newEncoder() { ... }
         ...

     }

}

So I tend to prefer the original request from 4884238 (have the canonical names as constants), as 
the lookup via Charset.forName(...) then could be very fast compared to the anyway following heavy 
de/encoding work.

In that case we additionally could avoid such weird constructs
(potentially superfluously forces the UTF-8 charset class to be loaded):
  114     private ZipCoder(Charset cs) {
  115         this.cs = cs;
  116         this.isutf8 = cs.name().equals(StandardCharset.UTF_8.name());
  117     }

Instead we could have:
         this.isutf8 = cs.name().equals(Charset.UTF_8);

-Ulf