Draft JEP: To use UTF-8 as the default charset for the Java virtual machine.

Remi Forax forax at univ-mlv.fr
Wed Feb 21 11:04:19 UTC 2018


I agree with Uwe,
we should deprecate all methods/constructors that relies on the default charset. 

And we should do that before changing to use UTF-8 by default.

Remi


On February 21, 2018 8:53:54 AM UTC, Uwe Schindler <uschindler at apache.org> wrote:
>Hi,
>
>> This draft JEP contains a proposal to use UTF-8 as the default
>charset
>> for the JVM, so that
>> APIs that depend on the default charset behave consistently cross all
>> platforms.
>> 
>> For more details, please see:
>> https://bugs.openjdk.java.net/browse/JDK-8187041
>
>Thanks for finally adding a JEP like this. Thanks also to Robert Muir
>for always insisting in fixing this problem! I have a few comments:
>
>The JEP should NOT cause that new APIs, which may convert between
>characters and bytes to no longer explicitly accept a charset. One
>example is the proposed ByteBuffer methods taking String. The default
>ones would work with UTF-8, but it should still be possible to an API
>user to always add a charset whenever there is a conversion between
>bytes and chars. This is especially important as the user may still
>change the default and breaking your app. Because the rule is still:
>Only YOU, the developer, know the charset of your stuff when you load a
>JAR resource file or pass a String to the network in a ByteBuffer!
>
>The biggest offenders on this is also given as an example: FileReader
>and FileWriter. Although both classes subclass
>InputStreamReader/OutputStreamWriter and just pass the right delegate
>to the superclass in the ctor, both classes are missing the possibility
>to specify a charset. Because of this, the use of FileReader and
>FileWriter is completely forbidden in many Apache projects (Apache
>Lucene, Solr, Elasticsearch, Apache TIKA,...). So I'd suggest to also
>fix the API here and just add the missing ctors.
>
>The Java 7+ methods in java.nio.file.Files already ignore the default
>charset and always use UTF-8. How to proceed with those? Should they be
>changed to behave to the new mechanisms? I'd suggest to not do this, as
>its part of the spec (to use UTF-8) and should not rely on external
>forces, but I wanted to bring this in.
>
>Changing the default would help many users, if they are actually using
>newer JDKs. For those with older versions (and compiling their code
>against older versions), you still have to avoid the default charsets.
>In addition, as you still can change the "default charset", any library
>developer reading resources from its own JAR file or passing Strings to
>network protocols cannot rely on the fact, that the default charset is
>really UTF-8! (a user may have changed it to something else). Because
>of this, Apache libraries will forbid usage of all methods using
>default charsets (and locales + timezones). The "changeable default"
>does not affect application developers (because they have in most cases
>control about the environment), but library developers should always be
>explicit!
>
>For this to work, I also want to do some "advertisement": All library
>projects should use the Forbidden-Apis Maven/Gradle/Ant plugin to scan
>their bytecode for offenders using default charsets, default locales or
>relying on default timezones. See the blog post about it [1] and the
>project page [2]. The tool is also useful to replace "jdeps" in
>projects with Java versions before 8, as it can scan your code for
>access to internal JDK APIs, too. See the documentation [3] and github
>wiki pages for useful examples. It may also be a good idea to mention
>it in the JEP as a "workaround" or "further reading".
>
>Finally: Because one can still change the default, I'd propose to
>deprecate all methods that use a default charset (unrelated to actually
>changing the default). Only if you do this, it would make tools like
>"forbiddenapis" irrelevant for library developers.
>
>And finally, finally: I'd also propose to change the default Locale to
>Locale.ROOT (same issues). The String.toLowerCase() in Turkish locales
>still break thousands of apps! But that's a different JEP - but I would
>strongly support it!
>
>Uwe
>
>[1]
>http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html
>[2] https://github.com/policeman-tools/forbidden-apis
>[3] https://jenkins.thetaphi.de/job/Forbidden-APIs/javadoc/
>
>-----
>Uwe Schindler
>uschindler at apache.org 
>ASF Member, Apache Lucene PMC / Committer
>Bremen, Germany
>http://lucene.apache.org/

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


More information about the core-libs-dev mailing list