Making the source code utf-8
Magnus Ihse Bursie
magnus.ihse.bursie at oracle.com
Tue Feb 7 12:28:11 UTC 2023
Currently, the source code in the JDK is in an ill-defined encoding.
There is no official declaration of the encoding used. It is "mostly
ASCII", but the relatively few non-ASCII characters used are not
well-defined. In many cases, it is latin-1, but I am pretty certain
other encodings are used for e.g. Asian translations.
This is is creating unnecessary problems when working with the JDK code
base, while providing no benefit. We ended up here not by choice, but by
historical accident. Most recently, this issue has surfaced in
JDK-8301853, JDK-8301854 and JDK-8301855, but there has popped up issues
relating to this from time to time, e.g. JDK-8263028.
As JEP 400[1] confirms, UTF-8 is the way to go. We should follow up on
this by converting our code base to UTF-8.
I have created JDK-8301971[2] with the intention of converting all files
to UTF-8, and updating all infrastructure to recognize this fact.
Even though 99.9% of all text in the JDK repository is ASCII only, with
a code base the size of the JDK there are of course many, many instances
that needs to be checked and/or converted. I can take care of the
overarching issues, like updating compiler flags and develop tooling to
detect, and try to convert non-ASCII files based on my best guesses, but
in the end, there are likely to be many files which needs to be verified
by their respective teams, so that I did not assume the incorrect source
encoding.
So, before I go ahead and start doing this, I want to check:
* Is everyone onboard with this idea? I do assume that in 2023, having
UTF-8 encoding for text files is (or should be) a no-brainer, but I want
to verify that there is no-one opposing this.
* Should I open a JEP for this? On the one hand, it is likely to require
a non-trivial amount of work, but on the other hand, there is no change
visible for the end user, so it will be kind of pointless to announce.
For my part, I could go either way, so I'm interested in hearing
opinions, preferably with good rationales, for one way or the other.
/Magnus
[1] https://openjdk.org/jeps/400
[2] https://bugs.openjdk.org/browse/JDK-8301971
More information about the jdk-dev
mailing list