Constants for standard charsets -- CR #4884238
Mike Duigou
mike.duigou at oracle.com
Tue Apr 12 17:38:17 UTC 2011
On Apr 12 2011, at 03:33 , Alan Bateman wrote:
> Alan Bateman wrote:
>> I see your mail in the archives:
>>
>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-April/006487.html
>>
>> but I didn't receive it. I had a similar issue yesterday on another list but I've no idea where the problem is.
>>
>> -Alan
>>
> Just a couple of initial comments on the webrev:
>
> 1. In the standard charsets section of the class description then it might be useful to include a reference to Charsets, maybe "The {@link Charsets} class defines constants for each of the standard charsets".
OK
>
> 2. @see Charsets.DEFAULT, I assume this should be @see Charsets#DEFAULT_CHARSET
Correct. I changed it to DEFAULT_CHARSET and forgot to fix this link.
>
> 3. Looks like Charsets is using 2 rather than 4-space indenting.
Ooops, I will correct this.
>
>
> 4. It would be nice to update java.nio.file.Path's class description to replace Charset.forName("UTF-8") with Charsets.UTF_8;
I will do so.
> I was thinking more about DEFAULT_CHARSET and I'm not sure that we really need it. In the java.io package then all constructors that take a Charset also have a constructor that uses the default charset, same thing in java.lang.String and java.util.zip package. In javax.tools.JavaCompiler I see that null can be used to select the default charset. In java.nio.file.Files then we didn't include versions of readAllLines, newBufferedReader, etc. that didn't take a Charset parameter.
I agree that requiring an explicit Charset is best because it makes it clear what charset is being used. For me this argues though for the DEFAULT_CHARSET declaration because it's best to be obvious that the default charset is being used.
I always interpret content being accessed with the default charset in one of two ways;
- Content that's known to be private that the jvm wrote itself. Useful for caches because it's assumed that the default charset is the most efficient for that platform & configuration.
- Content that's potentially uninterpretable because it has an unknown charset and the default charset is the fallback choice. In recent times I've considered switching to using UTF-8 for unknown content.
Charset.getDefaultCharset() is possibly just as clear. I personally would use the constant and use only Charsets constants for accessing content.
> They can be added if needed but there is an argument that you really need to know the charset when accessing a text file as it can be too fragile to assume the default encoding (esp. with files that are shared between users, applications, or machines).
I wouldn't add them. Default charset content should never be shared between instances (though it frequently is).
When I have used the default charset it's usually been in mime type declarations for content encoded using the default charset. An example from JXTA:
private static final MimeMediaType DEFAULT_TEXT_ENCODING = new MimeMediaType(MimeMediaType.TEXT_DEFAULTENCODING, "charset=\"" + Charset.defaultCharset().name() + "\"", true)
My goal in adding a DEFAULT_CHARSET constant was to make use of the default charset more explicit. I definitely don't want to do anything which encourages inappropriate use of the default charset.
Mike
More information about the core-libs-dev
mailing list