Constants for standard charsets -- CR #4884238

Mike Duigou mike.duigou at oracle.com
Tue Apr 12 17:38:17 UTC 2011


On Apr 12 2011, at 03:33 , Alan Bateman wrote:

> Alan Bateman wrote:
>> I see your mail in the archives:
>> 
>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-April/006487.html 
>> 
>> but I didn't receive it. I had a similar issue yesterday on another list but I've no idea where the problem is.
>> 
>> -Alan
>> 
> Just a couple of initial comments on the webrev:
> 
> 1. In the standard charsets section of the class description then it might be useful to include a reference to Charsets, maybe "The {@link Charsets} class defines constants for each of the standard charsets".

OK
> 
> 2. @see Charsets.DEFAULT, I assume this should be @see Charsets#DEFAULT_CHARSET

Correct. I changed it to DEFAULT_CHARSET and forgot to fix this link.

> 
> 3. Looks like Charsets is using 2 rather than 4-space indenting.

Ooops, I will correct this.
> 
> 
> 4. It would be nice to update java.nio.file.Path's class description to replace Charset.forName("UTF-8") with Charsets.UTF_8;

I will do so.

> I was thinking more about DEFAULT_CHARSET and I'm not sure that we really need it. In the java.io package then all constructors that take a Charset also have a constructor that uses the default charset, same thing in java.lang.String and java.util.zip package. In javax.tools.JavaCompiler I see that null can be used to select the default charset. In java.nio.file.Files then we didn't include versions of readAllLines, newBufferedReader, etc. that didn't take a Charset parameter.

I agree that requiring an explicit Charset is best because it makes it clear what charset is being used. For me this argues though for the DEFAULT_CHARSET declaration because it's best to be obvious that the default charset is being used. 

I always interpret content being accessed with the default charset in one of two ways; 
- Content that's known to be private that the jvm wrote itself. Useful for caches because it's assumed that the default charset is the most efficient for that platform & configuration.
- Content that's potentially uninterpretable because it has an unknown charset and the default charset is the fallback choice. In recent times I've considered switching to using UTF-8 for unknown content.

Charset.getDefaultCharset() is possibly just as clear. I personally would use the constant and use only Charsets constants for accessing content. 

> They can be added if needed but there is an argument that you really need to know the charset when accessing a text file as it can be too fragile to assume the default encoding (esp. with files that are shared between users, applications,  or machines).

I wouldn't add them. Default charset content should never be shared between instances (though it frequently is).

When I have used the default charset it's usually been in mime type declarations for content encoded using the default charset. An example from JXTA:

private static final MimeMediaType DEFAULT_TEXT_ENCODING = new MimeMediaType(MimeMediaType.TEXT_DEFAULTENCODING, "charset=\"" + Charset.defaultCharset().name() + "\"", true)

My goal in adding a DEFAULT_CHARSET constant was to make use of the default charset more explicit. I definitely don't want to do anything which encourages inappropriate use of the default charset.



Mike




More information about the core-libs-dev mailing list