<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    Hi Brett,<br>
    <br>
    Extra care is needed if the input array might be modified
    concurrently with the method execution.<br>
    When control flow decisions are made based on array contents, the
    integrity of the result depends on reading each byte of the array
    exactly once.<br>
    <br>
    Regards, Roger<br>
    <br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 7/27/25 4:45 PM, Brett Okken wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:CANBJVOFLAWc_XqqT9vK4m4F6UG6_My3QsrBqgo+odz3gih7OrQ@mail.gmail.com">
      
      <div dir="ltr">
        <div>In String.encodeUTF8, when the coder is latin1, there is a
          call to StringCoding.hasNegatives to determine if any special
          handling is needed. If not, a clone of the val is returned.</div>
        <div>If there are negative values, it then loops, from the
          beginning, through all the values to handle any individual
          negative values.</div>
        <div><br>
        </div>
        <div>Would it be better to call StringCoding.countPositives? If
          the result equals the length, the clone can still be returned.
          But if it does not, all the values which are positive can be
          simply copied to the target byte[] and only values beyond that
          point need to be checked again.</div>
        <div><br>
        </div>
        <div><a href="https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L1287-L1300" moz-do-not-send="true" class="moz-txt-link-freetext">https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L1287-L1300</a></div>
        <div><br>
        </div>
        <div>        if (!StringCoding.hasNegatives(val, 0, val.length))
          {<br>
                      return val.clone();<br>
                  }<br>
          <br>
                  int dp = 0;<br>
                  byte[] dst = StringUTF16.newBytesFor(val.length);<br>
                  for (byte c : val) {<br>
                      if (c < 0) {<br>
                          dst[dp++] = (byte) (0xc0 | ((c & 0xff)
          >> 6));<br>
                          dst[dp++] = (byte) (0x80 | (c & 0x3f));<br>
                      } else {<br>
                          dst[dp++] = c;<br>
                      }<br>
                  }</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>Can be changed to look like:<br>
        </div>
        <div><br>
        </div>
        <div>        int positives = StringCoding.countPositives(val, 0,
          val.length);<br>
                  if (positives == val.length) {<br>
                      return val.clone();<br>
                  }</div>
        <div><br>
        </div>
        <div>        <span class="gmail-blob-code-inner gmail-blob-code-marker"><span class="gmail-pl-smi">int</span> <span class="gmail-pl-s1">dp</span>
            = <span class="gmail-pl-s1 gmail-x gmail-x-first gmail-x-last">positives</span>;</span>
          <br>
                  byte[] dst = StringUTF16.newBytesFor(val.length);</div>
        <div>        if (positives > 0) {<br>
                      System.arraycopy(val, 0, dst, 0, positives);<br>
                  }<br>
                  for (int i=dp; i<val.length; ++i) {<br>
                      byte c = val[i]; 
          <div>            if (c < 0) {<br>
                            dst[dp++] = (byte) (0xc0 | ((c & 0xff)
            >> 6));<br>
                            dst[dp++] = (byte) (0x80 | (c & 0x3f));<br>
                        } else {<br>
                            dst[dp++] = c;<br>
                        }<br>
                    }</div>
          <div><br>
          </div>
          <div><br>
          </div>
          <div><br>
          </div>
          <div>I have done a bit of testing with the StringEncode jmh
            benchmark on my local windows device.</div>
          <div><br>
          </div>
          encodeLatin1LongEnd speeds up significantly (~70%)</div>
        <div>encodeLatin1LongStart slows down (~20%)</div>
        <div>encodeLatin1Mixed speeds up by ~30%</div>
        <div><br>
        </div>
        <div>The remaining tests do not show much difference either way.</div>
        <div><br>
        </div>
        <div>Brett</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>