<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
Hi,
<div class=""><br class="">
</div>
<div class="">general comment: String might be one of the trickier places to add a VarHandle dependency, since String is used very early in the bootstrap and depended upon by everything else, so I’d expect such a solution would have to navigate potential circularity
issues carefully. It’d be good to experiment with changes to java.lang.String proper to see that the solution that works nice externally is or can be made feasible within String.</div>
<div class=""><br class="">
</div>
<div class="">Specifically on the performance opportunity then while US-ASCII encoding is probably on the way out we shouldn’t neglect it.</div>
<div class=""><br class="">
</div>
<div class="">One way to go about this without pulling VarHandles into String might be to use what other encode methods in String does and leverage StringCoding.countPositives:</div>
<div class=""><br class="">
</div>
<div class=""><a href="https://github.com/openjdk/jdk/pull/12640" class="">https://github.com/openjdk/jdk/pull/12640</a></div>
<div class=""><br class="">
</div>
<div class="">Testing this on the existing StringEncode microbenchmark, shows a promising speed-up when the input is ASCII-encodable:</div>
<div class=""><br class="">
</div>
<div class="">Baseline</div>
<div class="">
<div class="">Benchmark (charsetName) Mode Cnt Score Error Units</div>
<div class="">StringEncode.encodeAsciiLong US-ASCII avgt 5 26626,025 ± 448,307 ns/op</div>
<div class="">StringEncode.encodeAsciiShort US-ASCII avgt 5 33,336 ± 2,032 ns/op</div>
</div>
<div class=""><br class="">
</div>
<div class="">Patch:</div>
<div class="">
<div class="">Benchmark (charsetName) Mode Cnt Score Error Units</div>
<div class="">
<div class="">StringEncode.encodeAsciiLong US-ASCII avgt 5 5492,985 ± 40,066 ns/op</div>
</div>
<div class="">StringEncode.encodeAsciiShort US-ASCII avgt 5 28,545 ± 4,883 ns/op</div>
</div>
<div class="">
<div><br class="">
</div>
<div>(You might see that this will go back into a scalar loop on encoding failures. That loop could still benefit from unrolling or byteArrayViewVarHandle, but I think you have a bigger problem in an app than raw performance if you have a lot of encoding failures...)</div>
<div><br class="">
</div>
<div>WDYT?</div>
<div><br class="">
</div>
<div>/Claes</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">18 feb. 2023 kl. 19:36 skrev Brett Okken <<a href="mailto:brett.okken.os@gmail.com" class="">brett.okken.os@gmail.com</a>>:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class=""><a href="https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L976-L981" class="">https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L976-L981</a><br class="">
<br class="">
For String.encodeASCII, with the LATIN1 coder is there any interest in<br class="">
exploring the performance impacts of utilizing a<br class="">
byteArrayViewVarHandle to read/write as longs and utilize a bitmask to<br class="">
identify if negative values are present?<br class="">
<br class="">
A simple jmh benchmark covering either 0 or 1 non-ascii (negative)<br class="">
values shows times cut in half (or more) for most scenarios with<br class="">
strings ranging in length from 3 - ~2000.<br class="">
VM version: JDK 17.0.6, OpenJDK 64-Bit Server VM, 17.0.6+10<br class="">
Windows 10 Intel(R) Core(TM) i7-9850H<br class="">
<br class="">
Hand unrolling the loops shows noted improvement, but does make for<br class="">
less aesthetically pleasing code.<br class="">
<br class="">
<br class="">
Benchmark (nonascii) (size) Mode<br class="">
Cnt Score Error Units<br class="">
AsciiEncodeBenchmark.jdk none 3 avgt<br class="">
4 15.531 ± 1.122 ns/op<br class="">
AsciiEncodeBenchmark.jdk none 10 avgt<br class="">
4 17.350 ± 0.473 ns/op<br class="">
AsciiEncodeBenchmark.jdk none 16 avgt<br class="">
4 18.277 ± 0.421 ns/op<br class="">
AsciiEncodeBenchmark.jdk none 23 avgt<br class="">
4 20.139 ± 0.350 ns/op<br class="">
AsciiEncodeBenchmark.jdk none 33 avgt<br class="">
4 22.008 ± 0.656 ns/op<br class="">
AsciiEncodeBenchmark.jdk none 42 avgt<br class="">
4 24.393 ± 1.155 ns/op<br class="">
AsciiEncodeBenchmark.jdk none 201 avgt<br class="">
4 55.884 ± 0.645 ns/op<br class="">
AsciiEncodeBenchmark.jdk none 511 avgt<br class="">
4 120.817 ± 2.917 ns/op<br class="">
AsciiEncodeBenchmark.jdk none 2087 avgt<br class="">
4 471.039 ± 13.329 ns/op<br class="">
AsciiEncodeBenchmark.jdk first 3 avgt<br class="">
4 15.794 ± 1.494 ns/op<br class="">
AsciiEncodeBenchmark.jdk first 10 avgt<br class="">
4 18.446 ± 0.780 ns/op<br class="">
AsciiEncodeBenchmark.jdk first 16 avgt<br class="">
4 20.458 ± 0.394 ns/op<br class="">
AsciiEncodeBenchmark.jdk first 23 avgt<br class="">
4 22.934 ± 0.422 ns/op<br class="">
AsciiEncodeBenchmark.jdk first 33 avgt<br class="">
4 25.367 ± 0.178 ns/op<br class="">
AsciiEncodeBenchmark.jdk first 42 avgt<br class="">
4 28.620 ± 0.678 ns/op<br class="">
AsciiEncodeBenchmark.jdk first 201 avgt<br class="">
4 80.250 ± 4.376 ns/op<br class="">
AsciiEncodeBenchmark.jdk first 511 avgt<br class="">
4 185.518 ± 6.370 ns/op<br class="">
AsciiEncodeBenchmark.jdk first 2087 avgt<br class="">
4 713.213 ± 13.488 ns/op<br class="">
AsciiEncodeBenchmark.jdk last 3 avgt<br class="">
4 14.991 ± 0.190 ns/op<br class="">
AsciiEncodeBenchmark.jdk last 10 avgt<br class="">
4 18.284 ± 0.317 ns/op<br class="">
AsciiEncodeBenchmark.jdk last 16 avgt<br class="">
4 20.591 ± 1.002 ns/op<br class="">
AsciiEncodeBenchmark.jdk last 23 avgt<br class="">
4 22.560 ± 0.963 ns/op<br class="">
AsciiEncodeBenchmark.jdk last 33 avgt<br class="">
4 25.521 ± 0.554 ns/op<br class="">
AsciiEncodeBenchmark.jdk last 42 avgt<br class="">
4 28.484 ± 0.446 ns/op<br class="">
AsciiEncodeBenchmark.jdk last 201 avgt<br class="">
4 79.434 ± 2.256 ns/op<br class="">
AsciiEncodeBenchmark.jdk last 511 avgt<br class="">
4 186.639 ± 4.258 ns/op<br class="">
AsciiEncodeBenchmark.jdk last 2087 avgt<br class="">
4 725.196 ± 149.416 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy none 3 avgt<br class="">
4 7.222 ± 0.428 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy none 10 avgt<br class="">
4 8.070 ± 0.171 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy none 16 avgt<br class="">
4 6.711 ± 0.409 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy none 23 avgt<br class="">
4 12.906 ± 3.633 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy none 33 avgt<br class="">
4 10.414 ± 0.447 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy none 42 avgt<br class="">
4 11.935 ± 1.235 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy none 201 avgt<br class="">
4 29.538 ± 3.265 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy none 511 avgt<br class="">
4 106.228 ± 68.475 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy none 2087 avgt<br class="">
4 494.845 ± 890.985 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy first 3 avgt<br class="">
4 7.775 ± 0.278 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy first 10 avgt<br class="">
4 13.396 ± 2.072 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy first 16 avgt<br class="">
4 13.528 ± 0.702 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy first 23 avgt<br class="">
4 17.376 ± 0.360 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy first 33 avgt<br class="">
4 16.251 ± 0.203 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy first 42 avgt<br class="">
4 17.932 ± 1.773 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy first 201 avgt<br class="">
4 39.028 ± 4.699 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy first 511 avgt<br class="">
4 92.599 ± 11.078 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy first 2087 avgt<br class="">
4 347.728 ± 7.837 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy last 3 avgt<br class="">
4 7.472 ± 0.078 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy last 10 avgt<br class="">
4 8.371 ± 0.815 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy last 16 avgt<br class="">
4 6.766 ± 0.253 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy last 23 avgt<br class="">
4 12.879 ± 0.454 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy last 33 avgt<br class="">
4 10.491 ± 0.811 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy last 42 avgt<br class="">
4 12.435 ± 1.212 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy last 201 avgt<br class="">
4 28.507 ± 1.058 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy last 511 avgt<br class="">
4 85.763 ± 1.941 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy last 2087 avgt<br class="">
4 411.555 ± 3.595 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll none 3 avgt<br class="">
4 5.858 ± 0.637 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll none 10 avgt<br class="">
4 7.031 ± 0.274 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll none 16 avgt<br class="">
4 6.768 ± 0.222 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll none 23 avgt<br class="">
4 10.084 ± 0.102 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll none 33 avgt<br class="">
4 9.876 ± 0.240 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll none 42 avgt<br class="">
4 11.061 ± 0.590 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll none 201 avgt<br class="">
4 29.264 ± 1.690 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll none 511 avgt<br class="">
4 61.920 ± 5.482 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll none 2087 avgt<br class="">
4 309.183 ± 42.354 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll first 3 avgt<br class="">
4 5.687 ± 0.249 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll first 10 avgt<br class="">
4 9.537 ± 0.337 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll first 16 avgt<br class="">
4 9.928 ± 0.329 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll first 23 avgt<br class="">
4 12.510 ± 0.519 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll first 33 avgt<br class="">
4 13.028 ± 0.335 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll first 42 avgt<br class="">
4 13.640 ± 0.219 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll first 201 avgt<br class="">
4 31.046 ± 0.647 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll first 511 avgt<br class="">
4 82.998 ± 5.611 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll first 2087 avgt<br class="">
4 360.294 ± 8.419 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll last 3 avgt<br class="">
4 5.657 ± 0.197 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll last 10 avgt<br class="">
4 6.997 ± 0.081 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll last 16 avgt<br class="">
4 6.890 ± 1.319 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll last 23 avgt<br class="">
4 10.154 ± 0.389 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll last 33 avgt<br class="">
4 9.986 ± 0.592 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll last 42 avgt<br class="">
4 11.481 ± 0.375 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll last 201 avgt<br class="">
4 29.286 ± 0.723 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll last 511 avgt<br class="">
4 61.056 ± 0.977 ns/op<br class="">
AsciiEncodeBenchmark.longCheckCopy_unroll last 2087 avgt<br class="">
4 303.415 ± 17.326 ns/op<br class="">
<br class="">
<br class="">
<br class="">
@Benchmark<br class="">
public byte[] jdk() {<br class="">
final byte[] val = this.data;<br class="">
byte[] dst = Arrays.copyOf(val, val.length);<br class="">
for (int i = 0; i < dst.length; i++) {<br class="">
if (dst[i] < 0) {<br class="">
dst[i] = '?';<br class="">
}<br class="">
}<br class="">
return dst;<br class="">
}<br class="">
<br class="">
@Benchmark<br class="">
public byte[] longCheckCopy() {<br class="">
final byte[] val = this.data;<br class="">
byte[] dst = new byte[val.length];<br class="">
int i = 0;<br class="">
long word;<br class="">
for (int j=dst.length - 7; i < j; i+=8) {<br class="">
word = (long)LONG_BYTES.get(val, i);<br class="">
LONG_BYTES.set(dst, i, word);<br class="">
if ((word & LONG_NEG_MASK) != 0) {<br class="">
for (int x=i, y=i+8; x<y; x++) {<br class="">
if (dst[x] < 0) {<br class="">
dst[x] = '?';<br class="">
}<br class="">
}<br class="">
}<br class="">
}<br class="">
byte b;<br class="">
for (; i < dst.length; i++) {<br class="">
b = val[i];<br class="">
dst[i] = b < 0 ? (byte) '?' : b;<br class="">
}<br class="">
return dst;<br class="">
}<br class="">
<br class="">
@Benchmark<br class="">
public byte[] longCheckCopy_unroll() {<br class="">
final byte[] val = this.data;<br class="">
byte[] dst = new byte[val.length];<br class="">
int i = 0;<br class="">
long word;<br class="">
for (int j=dst.length - 7; i < j; i+=8) {<br class="">
word = (long)LONG_BYTES.get(val, i);<br class="">
LONG_BYTES.set(dst, i, word);<br class="">
if ((word & LONG_NEG_MASK) != 0) {<br class="">
if (dst[i] < 0) {<br class="">
dst[i] = '?';<br class="">
}<br class="">
if (dst[i + 1] < 0) {<br class="">
dst[i + 1] = '?';<br class="">
}<br class="">
if (dst[i + 2] < 0) {<br class="">
dst[i + 2] = '?';<br class="">
}<br class="">
if (dst[i + 3] < 0) {<br class="">
dst[i + 3] = '?';<br class="">
}<br class="">
if (dst[i + 4] < 0) {<br class="">
dst[i + 4] = '?';<br class="">
}<br class="">
if (dst[i + 5] < 0) {<br class="">
dst[i + 5] = '?';<br class="">
}<br class="">
if (dst[i + 6] < 0) {<br class="">
dst[i + 6] = '?';<br class="">
}<br class="">
if (dst[i + 7] < 0) {<br class="">
dst[i + 7] = '?';<br class="">
}<br class="">
}<br class="">
}<br class="">
byte b;<br class="">
switch (dst.length & 0x7) {<br class="">
case 7:<br class="">
b = val[i + 6];<br class="">
dst[i + 6] = b < 0 ? (byte) '?' : b;<br class="">
case 6:<br class="">
b = val[i + 5];<br class="">
dst[i + 5] = b < 0 ? (byte) '?' : b;<br class="">
case 5:<br class="">
b = val[i + 4];<br class="">
dst[i + 4] = b < 0 ? (byte) '?' : b;<br class="">
case 4:<br class="">
b = val[i + 3];<br class="">
dst[i + 3] = b < 0 ? (byte) '?' : b;<br class="">
case 3:<br class="">
b = val[i + 2];<br class="">
dst[i + 2] = b < 0 ? (byte) '?' : b;<br class="">
case 2:<br class="">
b = val[i + 1];<br class="">
dst[i + 1] = b < 0 ? (byte) '?' : b;<br class="">
case 1:<br class="">
b = val[i];<br class="">
dst[i] = b < 0 ? (byte) '?' : b;<br class="">
}<br class="">
return dst;<br class="">
}<br class="">
<br class="">
<br class="">
Thanks,<br class="">
<br class="">
Brett<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>