<div dir="ltr">+1 to make the code build regardless of the user's environment / locale.<br><div><br></div><div>Would it be possible to enforce ASCII by default, and allow UTF-8 in exceptional cases? This would give us one extra layer of protection against trojan sources [1]</div><div><br></div><div>Regards,</div><div>Daniel</div><div><br></div><div>[1] <a href="https://trojansource.codes/">https://trojansource.codes/</a></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">wt., 7 lut 2023 o 13:28 Magnus Ihse Bursie <<a href="mailto:magnus.ihse.bursie@oracle.com">magnus.ihse.bursie@oracle.com</a>> napisał(a):<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Currently, the source code in the JDK is in an ill-defined encoding. <br>

There is no official declaration of the encoding used. It is "mostly <br>

ASCII", but the relatively few non-ASCII characters used are not <br>

well-defined. In many cases, it is latin-1, but I am pretty certain <br>

other encodings are used for e.g. Asian translations.<br>

<br>

This is is creating unnecessary problems when working with the JDK code <br>

base, while providing no benefit. We ended up here not by choice, but by <br>

historical accident. Most recently, this issue has surfaced in <br>

JDK-8301853, JDK-8301854 and JDK-8301855, but there has popped up issues <br>

relating to this from time to time, e.g. JDK-8263028.<br>

<br>

As JEP 400[1] confirms, UTF-8 is the way to go. We should follow up on <br>

this by converting our code base to UTF-8.<br>

<br>

I have created JDK-8301971[2] with the intention of converting all files <br>

to UTF-8, and updating all infrastructure to recognize this fact.<br>

<br>

Even though 99.9% of all text in the JDK repository is ASCII only, with <br>

a code base the size of the JDK there are of course many, many instances <br>

that needs to be checked and/or converted. I can take care of the <br>

overarching issues, like updating compiler flags and develop tooling to <br>

detect, and try to convert non-ASCII files based on my best guesses, but <br>

in the end, there are likely to be many files which needs to be verified <br>

by their respective teams, so that I did not assume the incorrect source <br>

encoding.<br>

<br>

So, before I go ahead and start doing this, I want to check:<br>

<br>

* Is everyone onboard with this idea? I do assume that in 2023, having <br>

UTF-8 encoding for text files is (or should be) a no-brainer, but I want <br>

to verify that there is no-one opposing this.<br>

<br>

* Should I open a JEP for this? On the one hand, it is likely to require <br>

a non-trivial amount of work, but on the other hand, there is no change <br>

visible for the end user, so it will be kind of pointless to announce. <br>

For my part, I could go either way, so I'm interested in hearing <br>

opinions, preferably with good rationales, for one way or the other.<br>

<br>

/Magnus<br>

<br>

[1] <a href="https://openjdk.org/jeps/400" rel="noreferrer" target="_blank">https://openjdk.org/jeps/400</a><br>

[2] <a href="https://bugs.openjdk.org/browse/JDK-8301971" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8301971</a><br>

<br>

<br>

</blockquote></div>