The message from this sender included one or more files<BR>
which could not be scanned for virus detection; do not<BR>
open these files unless you are certain of the sender's intent.<BR>
<HR><div dir="ltr"><div><div dir="ltr">On Mon, Mar 6, 2023 at 4:15 PM Jonathan Gibbons <<a href="mailto:jonathan.gibbons@oracle.com">jonathan.gibbons@oracle.com</a>> wrote:</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Yes, as a general rule, the compiler and runtime should be
mutually consistent.</p></div></blockquote><div>I've updated the PR to check for classfile major version < 48. In that case longer-than-necessary encodings are allowed.<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<p>This discussion probably also applies to javac reading names in
source files and having those names propagate to class files.</p></div></blockquote><div>Agreed. Though I think we're good here because the Lexer/Scanner uses a CharsetDecoder that detects errors on malformed input. As a simple test I verified that StandardCharsets.UTF_8 returns "MALFORMED" on input with "รจ" encoded as <span style="font-family:monospace">c3 e8</span>.</div><div><br></div><div>And after the lexer step, you're going from char[] to byte[], and those conversions are already being done correctly in the compiler code.<br></div><div><br></div><div>It's the byte[] to char[] step in which "non-standard" encodings can creep in.<br></div></div><br clear="all"></div><div>-Archie</div><div><br></div><div>-- <br><div dir="ltr" class="gmail_signature">Archie L. Cobbs<br></div></div></div>