RFR: 8260265: UTF-8 by Default

Naoto Sato naoto at openjdk.java.net
Wed Jul 14 21:01:56 UTC 2021


On Wed, 14 Jul 2021 12:39:46 GMT, Giacomo Baso <github.com+12575901+gbaso at openjdk.org> wrote:

> > Consider an application that creates a java.io.FileWriter with its one-argument constructor and then uses it to write some text to a file. The resulting file will contain a sequence of bytes encoded using the default charset of the JDK running the application. A second application, run on a different machine or by a different user on the same machine, creates a java.io.FileReader with its one-argument constructor and uses it to read the bytes in that file. The resulting text contains a sequence of characters decoded using the default charset of the JDK running the second application. If the default charset differs between the JDK of the first application and the JDK of the second application, then the resulting text may be silently corrupted or incomplete, since these APIs replace erroneous input rather than fail.
> 
> It's even worse than that, because many OpenSSH installs are configured by default to [forward](https://man.openbsd.org/ssh_config.5#SendEnv) and [accept](https://man.openbsd.org/sshd_config.5#AcceptEnv) the user locale (see e.g. for [RHEL 7](https://access.redhat.com/solutions/974273)).
> 
> So a single application, on a single remote machine, can be unknowingly started by a single user with different locales, and therefore different encodings, depending on how the user connected to the remote machine. For example, on Windows connecting via powershell results in `LANG=en_US.UTF-8`, while using WSL2 results in `LANG=C.UTF-8`. On Java 11 in a RHEL7 machine, `file.encoding` results in `UTF-8` in the first case, but `ANSI_X3.4-1968` in the second, leading to a default charset `ASCII`.
> 
> Worth mentioning is also that `Charset.forName("default")` is just an alias to `ASCII`, per `sun.nio.cs.StandardCharsets$Aliases`.

Thanks. Updated the JEP.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4733


More information about the core-libs-dev mailing list