<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>See also this:<br>
      <a class="moz-txt-link-freetext" href="https://en.wikipedia.org/wiki/Unicode_equivalence">https://en.wikipedia.org/wiki/Unicode_equivalence</a></p>
    <p>-- Jon</p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 3/5/23 3:12 PM, Archie Cobbs wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:CANSoFxv+uqaMuZUCKjnFzZ_vWw5XaUAKjcT9NrjR3KFw8-_4xA@mail.gmail.com">
      
      <div dir="ltr">
        <div>Hi Jon,</div>
        <div><br>
        </div>
        <div>Thanks for taking a look at the patch.<br>
        </div>
        <div dir="ltr"><br>
        </div>
        On Fri, Mar 3, 2023 at 5:07 PM Jonathan Gibbons <<a href="mailto:jonathan.gibbons@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">jonathan.gibbons@oracle.com</a>>
        wrote:
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>I would give you inline code comments, except that it's
              not a PR yet.  I note that I generally distrust the
              `getMessage` for any exception for which the message is
              not formally specified in some way ... in other words,
              don't assume that `e.getMessage()` by itself is
              interesting. </div>
          </blockquote>
          <div><br>
          </div>
          <div> That makes sense, and is easy to fix - thanks for the
            suggestion.<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p>Is it possible to write a test for the bug fix in
                PoolReader?   What is an example of a name encoded in
                two different ways?</p>
            </div>
          </blockquote>
          <div>In any multi-byte UTF-8 sequence, the bytes after the
            first are supposed to all look like <span style="font-family:monospace">0x10xxxxxx</span>. But the
            code is not checking that, so e.g., you could have <span style="font-family:monospace">0x11xxxxxx</span> instead
            and it would encode the same character but not match
            byte-for-byte. For example, è = <span style="font-family:monospace">c3 a8</span>, but <span style="font-family:monospace">Convert.java</span> would
            also accept <span style="font-family:monospace">c3 e8</span>
            or <span style="font-family:monospace">c3 28</span> for
            "è".</div>
          <div><br>
          </div>
          <div>Because the Name hash tables store UTF-8 byte sequences,
            if the same Name were encoded two different ways, it would
            get added to the hash table twice.</div>
          <div><br>
          </div>
          <div>Another way this can happen is e.g. encoding a character
            as a 3-byte sequence when the character is actually small
            enough to fit in a 2-byte sequence. For example, <span style="font-family:monospace">e0 84 80</span> encodes
            character <span style="font-family:monospace">0x0100</span>,
            but it should really be encoded as <span style="font-family:monospace">c4 80</span>.<br>
          </div>
          <div><br>
          </div>
          <div>Thinking more about this, I think I should create a
            separate bug and patch for this particular problem. So,
            expect a digression on that next...<br>
          </div>
          <div><br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p>Although conceptually simple, this is a significant
                change for a very low level data type. It would be worth
                doing more testing than just the usual langtools tests. 
                For example, if you build JDK before and after this
                change, are the generated class files the same?</p>
            </div>
          </blockquote>
          <div>Definitely a test worth doing.<br>
          </div>
          <div><br>
          </div>
          <div>-Archie<br>
          </div>
        </div>
        <br>
        -- <br>
        <div dir="ltr">Archie L. Cobbs<br>
        </div>
      </div>
    </blockquote>
  </body>
</html>