<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>On 2023-02-07 14:07, Daniel Jeliński wrote:<br>

    </p>

    <blockquote type="cite" cite="mid:CAMrH03KDwJ+9vbhQ8EQG6nkuYcuCyK0BqB7AOmtmAaCAQL716Q@mail.gmail.com">

      <div dir="ltr">+1 to make the code build regardless of the user's

        environment / locale.<br>

        <div><br>

        </div>

        <div>Would it be possible to enforce ASCII by default, and allow

          UTF-8 in exceptional cases? This would give us one extra layer

          of protection against trojan sources [1]</div>

      </div>

    </blockquote>

    <p>ASCII-only certainly has it's advantages, yes, including

      protecting from that kind of attacks. <br>

    </p>

    <p>I think we need to treat the entire code base as UTF-8, e.g. in

      terms of what arguments we send to compilers. <br>

    </p>

    <p>With that said, we could extend jcheck to separately check if a

      file contains non-ASCII characters, and deny such changes to be

      pushed. <br>

    </p>

    <p>The question then becomes: how do we handle exceptions? By having

      a global "allow-list" containing filenames for files that are

      acceptable to have non-ASCII characters? By requiring them to have

      a certain name pattern? By inserting some kind of meta-data

      character sequence in them that marks them as non-ASCII?</p>

    <p>These are the only options I can think of, and none of them sound

      attractive to me.<br>

    </p>

    <p>A better approach, I think, is to have some kind of jcheck

      "warning" (not a blocker for integration) that warns you that you

      have non-ASCII characters in the code you are about to check in.

      That will, hopefully, be enough to fix unintentional introduction

      of e.g. typographic quotes, or malicious attacks (given that

      reviewers are alert for such warnings).</p>

    <p>/Magnus<br>

    </p>

    <p><br>

    </p>

    <p><br>

    </p>

    <blockquote type="cite" cite="mid:CAMrH03KDwJ+9vbhQ8EQG6nkuYcuCyK0BqB7AOmtmAaCAQL716Q@mail.gmail.com">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Regards,</div>

        <div>Daniel</div>

        <div><br>

        </div>

        <div>[1] <a href="https://trojansource.codes/" moz-do-not-send="true" class="moz-txt-link-freetext">https://trojansource.codes/</a></div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div><br>

        </div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">wt., 7 lut 2023 o 13:28 Magnus

          Ihse Bursie <<a href="mailto:magnus.ihse.bursie@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">magnus.ihse.bursie@oracle.com</a>>

          napisał(a):<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Currently,

          the source code in the JDK is in an ill-defined encoding. <br>

          There is no official declaration of the encoding used. It is

          "mostly <br>

          ASCII", but the relatively few non-ASCII characters used are

          not <br>

          well-defined. In many cases, it is latin-1, but I am pretty

          certain <br>

          other encodings are used for e.g. Asian translations.<br>

          <br>

          This is is creating unnecessary problems when working with the

          JDK code <br>

          base, while providing no benefit. We ended up here not by

          choice, but by <br>

          historical accident. Most recently, this issue has surfaced in

          <br>

          JDK-8301853, JDK-8301854 and JDK-8301855, but there has popped

          up issues <br>

          relating to this from time to time, e.g. JDK-8263028.<br>

          <br>

          As JEP 400[1] confirms, UTF-8 is the way to go. We should

          follow up on <br>

          this by converting our code base to UTF-8.<br>

          <br>

          I have created JDK-8301971[2] with the intention of converting

          all files <br>

          to UTF-8, and updating all infrastructure to recognize this

          fact.<br>

          <br>

          Even though 99.9% of all text in the JDK repository is ASCII

          only, with <br>

          a code base the size of the JDK there are of course many, many

          instances <br>

          that needs to be checked and/or converted. I can take care of

          the <br>

          overarching issues, like updating compiler flags and develop

          tooling to <br>

          detect, and try to convert non-ASCII files based on my best

          guesses, but <br>

          in the end, there are likely to be many files which needs to

          be verified <br>

          by their respective teams, so that I did not assume the

          incorrect source <br>

          encoding.<br>

          <br>

          So, before I go ahead and start doing this, I want to check:<br>

          <br>

          * Is everyone onboard with this idea? I do assume that in

          2023, having <br>

          UTF-8 encoding for text files is (or should be) a no-brainer,

          but I want <br>

          to verify that there is no-one opposing this.<br>

          <br>

          * Should I open a JEP for this? On the one hand, it is likely

          to require <br>

          a non-trivial amount of work, but on the other hand, there is

          no change <br>

          visible for the end user, so it will be kind of pointless to

          announce. <br>

          For my part, I could go either way, so I'm interested in

          hearing <br>

          opinions, preferably with good rationales, for one way or the

          other.<br>

          <br>

          /Magnus<br>

          <br>

          [1] <a href="https://openjdk.org/jeps/400" rel="noreferrer" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://openjdk.org/jeps/400</a><br>

          [2] <a href="https://bugs.openjdk.org/browse/JDK-8301971" rel="noreferrer" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://bugs.openjdk.org/browse/JDK-8301971</a><br>

          <br>

          <br>

        </blockquote>

      </div>

    </blockquote>

  </body>

</html>