Review (Updated) : 4884238 : Constants for Standard Charsets

Ulf Zibis Ulf.Zibis at gmx.de
Wed Apr 20 11:47:03 UTC 2011


Am 20.04.2011 02:23, schrieb Mike Duigou:
> On Apr 19 2011, at 04:52 , Ulf Zibis wrote:
>> I think, we should catch the problem at the source. ... In my approach from Bug 100098 - Make sun.nio.cs.* charset objects light-weight such a class is named 'FastCharset'.
> Unfortunately any we at a very late stage in Java 7's development and the degree of change required by 100098 are not possible.

Yes, that's right. For that reason I have suggested an intermediate solution to avoid the 
additional, later not removable StandardCharset(s) class.

>   This issue itself may also be rejected solely for missing impending deadlines.
Should there something be done for that?

>> So I tend to prefer the original request from 4884238 (have the canonical names as constants), as the lookup via Charset.forName(...) then could be very fast compared to the anyway following heavy de/encoding work.
> I think that in most uses a constant of the Charset is more useful as that's what's desired for use. I am not opposed to having visible constants for the charset names but I don't think it's as useful as the Charset objects. The performance of Charset.forName() is a separate matter.

Thinking more I agree to some direct access to the standard charsets, because Charset.forName(..) 
potentially needs extra exception handling. But is there a must for static constants? I think we 
could have lazy initialized static methods, so (1) only the Charset class of request should be 
loaded, (2) separate StandardCharset(s) class and discussion about the naming becomes superfluous, 
(3) the small short cut cache in Charset remains it's proper function (otherwise the last 2, 
UTF_16BE, UTF_16LE are cached), and (4) we could profit from it in Charset.defaultCharset() for the 
fall-back case:
  622                     defaultCharset = UTF_8();

I still think, we should have constants for the canonical charset names: Charset.UTF_8 = "UTF-8"; etc...

Additionally consider, that in many real world cases not the charset, but it's de/encoder is of 
interest, so the programmer anyway needs to define a static constant, if for performance reason it 
should be reused:
     static final CharsetDecoder UTF_8_DECODER = UTF_8.newDecoder();

Here my new suggestion:

public abstract class Charset implements Comparable<Charset> {
     static final String UTF_8 = "UTF-8";
     ...
     static final Charset UTF_8() {
         return forName(UTF_8); // Note that recently used charsets are hold in a small short cut cache.
     }
     ...
}

-Ulf





More information about the core-libs-dev mailing list