Making the source code utf-8

Jonathan Gibbons jonathan.gibbons at oracle.com
Tue Feb 7 18:33:59 UTC 2023


FYI

open/make/jdk/src/classes/build/tools/cldrconverter/ResourceBundleGenerator.java: 
encoding = "iso-8859-1";
open/make/Docs.gmk:    -encoding ISO-8859-1 -docencoding UTF-8 
-breakiterator \
open/make/Docs.gmk:    -encoding ISO-8859-1 -breakiterator -splitIndex 
--system none \


Even if we choose to keep the code base on ASCII, these references to 
iso-8859-1 should be fixed.

-- Jon


On 2/7/23 4:28 AM, Magnus Ihse Bursie wrote:
> Currently, the source code in the JDK is in an ill-defined encoding. 
> There is no official declaration of the encoding used. It is "mostly 
> ASCII", but the relatively few non-ASCII characters used are not 
> well-defined. In many cases, it is latin-1, but I am pretty certain 
> other encodings are used for e.g. Asian translations.
>
> This is is creating unnecessary problems when working with the JDK 
> code base, while providing no benefit. We ended up here not by choice, 
> but by historical accident. Most recently, this issue has surfaced in 
> JDK-8301853, JDK-8301854 and JDK-8301855, but there has popped up 
> issues relating to this from time to time, e.g. JDK-8263028.
>
> As JEP 400[1] confirms, UTF-8 is the way to go. We should follow up on 
> this by converting our code base to UTF-8.
>
> I have created JDK-8301971[2] with the intention of converting all 
> files to UTF-8, and updating all infrastructure to recognize this fact.
>
> Even though 99.9% of all text in the JDK repository is ASCII only, 
> with a code base the size of the JDK there are of course many, many 
> instances that needs to be checked and/or converted. I can take care 
> of the overarching issues, like updating compiler flags and develop 
> tooling to detect, and try to convert non-ASCII files based on my best 
> guesses, but in the end, there are likely to be many files which needs 
> to be verified by their respective teams, so that I did not assume the 
> incorrect source encoding.
>
> So, before I go ahead and start doing this, I want to check:
>
> * Is everyone onboard with this idea? I do assume that in 2023, having 
> UTF-8 encoding for text files is (or should be) a no-brainer, but I 
> want to verify that there is no-one opposing this.
>
> * Should I open a JEP for this? On the one hand, it is likely to 
> require a non-trivial amount of work, but on the other hand, there is 
> no change visible for the end user, so it will be kind of pointless to 
> announce. For my part, I could go either way, so I'm interested in 
> hearing opinions, preferably with good rationales, for one way or the 
> other.
>
> /Magnus
>
> [1] https://openjdk.org/jeps/400
> [2] https://bugs.openjdk.org/browse/JDK-8301971
>
>


More information about the jdk-dev mailing list