<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<meta name="Generator" content="Microsoft Word 15 (filtered medium)">

<style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        font-size:10.0pt;

        font-family:"Calibri",sans-serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

span.EmailStyle19

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:612.0pt 792.0pt;

        margin:72.0pt 72.0pt 72.0pt 72.0pt;}

div.WordSection1

        {page:WordSection1;}

--></style>

</head>

<body lang="en-CH" link="blue" vlink="purple" style="word-wrap:break-word">

<div class="WordSection1">

<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;mso-fareast-language:EN-US">I would welcome this change.<o:p></o:p></span></p>

<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>

<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;mso-fareast-language:EN-US">A later discussion may focus on whether a BOM would be helpful and/or required, even if it somehow contrasts with recommendations.<o:p></o:p></span></p>

<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>

<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;mso-fareast-language:EN-US">Raffaello<o:p></o:p></span></p>

<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>

<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">

<p class="MsoNormal" style="mso-margin-top-alt:0cm;margin-right:0cm;margin-bottom:12.0pt;margin-left:36.0pt">

<b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">jdk-dev <jdk-dev-retn@openjdk.org> on behalf of Magnus Ihse Bursie <magnus.ihse.bursie@oracle.com><br>

<b>Date: </b>Tuesday, 7 February 2023 at 13:28<br>

<b>To: </b>jdk-dev@openjdk.org <jdk-dev@openjdk.org><br>

<b>Subject: </b>Making the source code utf-8<o:p></o:p></span></p>

</div>

<div>

<p class="MsoNormal" style="mso-margin-top-alt:0cm;margin-right:0cm;margin-bottom:12.0pt;margin-left:36.0pt">

<span style="font-size:11.0pt">Currently, the source code in the JDK is in an ill-defined encoding.

<br>

There is no official declaration of the encoding used. It is "mostly <br>

ASCII", but the relatively few non-ASCII characters used are not <br>

well-defined. In many cases, it is latin-1, but I am pretty certain <br>

other encodings are used for e.g. Asian translations.<br>

<br>

This is is creating unnecessary problems when working with the JDK code <br>

base, while providing no benefit. We ended up here not by choice, but by <br>

historical accident. Most recently, this issue has surfaced in <br>

JDK-8301853, JDK-8301854 and JDK-8301855, but there has popped up issues <br>

relating to this from time to time, e.g. JDK-8263028.<br>

<br>

As JEP 400[1] confirms, UTF-8 is the way to go. We should follow up on <br>

this by converting our code base to UTF-8.<br>

<br>

I have created JDK-8301971[2] with the intention of converting all files <br>

to UTF-8, and updating all infrastructure to recognize this fact.<br>

<br>

Even though 99.9% of all text in the JDK repository is ASCII only, with <br>

a code base the size of the JDK there are of course many, many instances <br>

that needs to be checked and/or converted. I can take care of the <br>

overarching issues, like updating compiler flags and develop tooling to <br>

detect, and try to convert non-ASCII files based on my best guesses, but <br>

in the end, there are likely to be many files which needs to be verified <br>

by their respective teams, so that I did not assume the incorrect source <br>

encoding.<br>

<br>

So, before I go ahead and start doing this, I want to check:<br>

<br>

* Is everyone onboard with this idea? I do assume that in 2023, having <br>

UTF-8 encoding for text files is (or should be) a no-brainer, but I want <br>

to verify that there is no-one opposing this.<br>

<br>

* Should I open a JEP for this? On the one hand, it is likely to require <br>

a non-trivial amount of work, but on the other hand, there is no change <br>

visible for the end user, so it will be kind of pointless to announce. <br>

For my part, I could go either way, so I'm interested in hearing <br>

opinions, preferably with good rationales, for one way or the other.<br>

<br>

/Magnus<br>

<br>

[1] <a href="https://openjdk.org/jeps/400">https://openjdk.org/jeps/400</a><br>

[2] <a href="https://bugs.openjdk.org/browse/JDK-8301971">https://bugs.openjdk.org/browse/JDK-8301971</a><br>

<br>

<o:p></o:p></span></p>

</div>

</div>

</body>

</html>