<div dir="ltr"><div>Hi Jon et. al.,<br></div><div><br></div><div>Picking back up this email thread regarding <a href="https://bugs.openjdk.org/browse/JDK-8269957" target="_blank">JDK-8269957</a>: "facilitate alternate impls of NameTable and Name" after a detour to incorporate these recent changes/fixes:</div><div><ul><li>JDK-8303526: Changing "arbitrary" Name.compareTo() ordering breaks the regression suite</li><li>JDK-8303623: Compiler should disallow non-standard UTF-8 string encodings</li></ul></div><div>I've created a new (draft) PR - <a href="https://github.com/openjdk/jdk/pull/13282" target="_blank">https://github.com/openjdk/jdk/pull/13282</a></div><div><br></div><div>When you get a chance let me know what you think.</div><div><br></div><div>Thanks,<br></div><div>-Archie<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 6, 2023 at 6:57 PM Archie Cobbs <<a href="mailto:archie.cobbs@gmail.com" target="_blank">archie.cobbs@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div dir="ltr">On Mon, Mar 6, 2023 at 4:15 PM Jonathan Gibbons <<a href="mailto:jonathan.gibbons@oracle.com" target="_blank">jonathan.gibbons@oracle.com</a>> wrote:</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


  <div>

    <p>Yes, as a general rule, the compiler and runtime should be

      mutually consistent.</p></div></blockquote><div>I've updated the PR to check for classfile major version < 48. In that case longer-than-necessary encodings are allowed.<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>

    <p>This discussion probably also applies to javac reading names in

      source files and having those names propagate to class files.</p></div></blockquote><div>Agreed. Though I think we're good here because the Lexer/Scanner uses a CharsetDecoder that detects errors on malformed input. As a simple test I verified that StandardCharsets.UTF_8 returns "MALFORMED" on input with "è" encoded as <span style="font-family:monospace">c3 e8</span>.</div><div><br></div><div>And after the lexer step, you're going from char[] to byte[], and those conversions are already being done correctly in the compiler code.<br></div><div><br></div><div>It's the byte[] to char[] step in which "non-standard" encodings can creep in.<br></div></div><br clear="all"></div><div>-Archie</div><div><br></div><div>-- <br><div dir="ltr">Archie L. Cobbs<br></div></div></div>

</blockquote></div><br clear="all"><br><span>-- </span><br><div dir="ltr">Archie L. Cobbs<br></div>