RFR: 8260265: UTF-8 by Default

Roger Riggs rriggs at openjdk.java.net
Wed Jul 14 15:46:15 UTC 2021


On Thu, 8 Jul 2021 21:23:00 GMT, Naoto Sato <naoto at openjdk.org> wrote:

> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of the changes is `Charset.defaultCharset()` returning `UTF-8` and `file.encoding` system property being added in the spec, but another notable modification is in `java.io.PrintStream` where it continues to use the `Console` encoding as the default charset instead of `UTF-8`. Other changes are mostly clarification of the term "default charset" and their links. Corresponding CSR has also been drafted.
> 
> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266

src/java.base/share/classes/java/io/ByteArrayOutputStream.java line 291:

> 289:      * method, which takes an encoding-name or charset argument,
> 290:      * or the {@code toString()} method, which uses the default
> 291:      * charset.

Fold to previous line.

src/java.base/share/classes/java/io/Console.java line 587:

> 585:                 try {
> 586:                     cs = Charset.forName(csname);
> 587:                 } catch (Exception ignored) { }

A separate enhancement...
I've long thought that should be a way to avoid the exception here.
For example,  a Charset.forName(csname, default);
The caller might have a default in mind or supply null and then be able to test for null.

src/java.base/share/classes/java/io/FileReader.java line 41:

> 39:  * @see InputStreamReader
> 40:  * @see FileInputStream
> 41:  * @see java.nio.charset.Charset#defaultCharset()

The @ see duplicates the link above, the javadoc can do without the @ see.

src/java.base/share/classes/java/io/InputStreamReader.java line 39:

> 37:  * java.nio.charset.Charset charset}.  The charset that it uses
> 38:  * may be specified by name or may be given explicitly, or the
> 39:  * {@link Charset#defaultCharset() default charset} may be accepted.

"may be accepted" seems like the API has some choice in the matter. 
Perhaps "accepted" -> "used".
And in other classes below if there's a suitable replacement.

src/java.base/share/classes/java/io/PrintStream.java line 49:

> 47:  * <p> All characters printed by a {@code PrintStream} are converted into
> 48:  * bytes using the given encoding or charset, or the default
> 49:  * console charset if not specified.

JEP 400 doesn't give a rationale for using the console charset for PrintStream.
PrintStreams are used for output to files and other media other than just a tty/console.
The charset of system.out/err should use the console charset.

src/java.base/share/classes/java/lang/System.java line 802:

> 800:      * <tr><th scope="row">{@systemProperty file.encoding}</th>
> 801:      *     <td>The name of the default charset. Users may specify
> 802:      *     {@code UTF-8} or {@code COMPAT} on the command line to the value.

The wording could imply that only those two values can be supplied.
It could be rephrased to say that *if* the property is supplied on the command line
it overrides the default UTF-8.

src/java.base/share/classes/java/net/URLDecoder.java line 92:

> 90: 
> 91:     // The default charset
> 92:     static String dfltEncName = URLEncoder.dfltEncName;

Perhaps add the value of file.encoding to the StaticProperties either as a string or as the Charset.
That would allow a few different lookups of the property to be simplified.

src/java.base/share/classes/java/net/URLEncoder.java line 165:

> 163:         try {
> 164:             str = encode(s, dfltEncName);
> 165:         } catch (UnsupportedEncodingException e) {

Perhaps a separate cleanup, the Charset should be cached, not just the name and use the `encode(s, charset)` method.

src/java.base/share/classes/java/nio/charset/Charset.java line 601:

> 599:      * value designates {@code COMPAT}, the default charset is derived from
> 600:      * the {@code native.encoding} system property, which typically depends
> 601:      * upon the locale and charset of the underlying operating system.

The description in java.lang.System of the file.encoding property does not indicate it is 'implementation specific'.
In that context, it appears to be part of the JavaSE spec.
Having the spec in a single place with references to it from others could avoid duplication.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4733


More information about the core-libs-dev mailing list