String encodeASCII
Claes Redestad
claes.redestad at oracle.com
Mon Feb 20 11:39:21 UTC 2023
Unsafe might be similarly tricky dependency-wise, but perhaps less so. Either solution might work fine if we extract the code for encoding to a utility class that isn’t initialized eagerly with String.class itself.
I’ll file an RFE and get the ”trivial” fix to use StringCoding.countPositives in to establish a better baseline. I’m sure there are ways to beat this, for example with a fused loop. An unrolled and/or VH-based solution like you’ve proposed could at the very least be used to speedup the case of Strings for which we need to replace with ’?’.
/Claes
19 feb. 2023 kl. 02:54 skrev Brett Okken <brett.okken.os at gmail.com<mailto:brett.okken.os at gmail.com>>:
Thanks for taking a look at this.
That is a pretty good improvement with a much smaller change.
I recognize the intricacies of bootstrapping, but have no expertise. Would using Unsafe directly be easier? Particularly for shorter strings, doing the copying and checking together rather than as separate loops seems to speed things up considerably, even for happy-path of no failures.
Brett
On Sat, Feb 18, 2023 at 5:49 PM Claes Redestad <claes.redestad at oracle.com<mailto:claes.redestad at oracle.com>> wrote:
Hi,
general comment: String might be one of the trickier places to add a VarHandle dependency, since String is used very early in the bootstrap and depended upon by everything else, so I’d expect such a solution would have to navigate potential circularity issues carefully. It’d be good to experiment with changes to java.lang.String proper to see that the solution that works nice externally is or can be made feasible within String.
Specifically on the performance opportunity then while US-ASCII encoding is probably on the way out we shouldn’t neglect it.
One way to go about this without pulling VarHandles into String might be to use what other encode methods in String does and leverage StringCoding.countPositives:
https://github.com/openjdk/jdk/pull/12640<https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/12640__;!!ACWV5N9M2RV99hQ!O-1OoPjv-Jw1iGFv2-_q5zTiayoRyLJBcSiQ9J5IEun3kcusUeGRjZzOb_dploCjPh_Kjp6FJXr2cIk7Se5Zp4JhLWU$>
Testing this on the existing StringEncode microbenchmark, shows a promising speed-up when the input is ASCII-encodable:
Baseline
Benchmark (charsetName) Mode Cnt Score Error Units
StringEncode.encodeAsciiLong US-ASCII avgt 5 26626,025 ± 448,307 ns/op
StringEncode.encodeAsciiShort US-ASCII avgt 5 33,336 ± 2,032 ns/op
Patch:
Benchmark (charsetName) Mode Cnt Score Error Units
StringEncode.encodeAsciiLong US-ASCII avgt 5 5492,985 ± 40,066 ns/op
StringEncode.encodeAsciiShort US-ASCII avgt 5 28,545 ± 4,883 ns/op
(You might see that this will go back into a scalar loop on encoding failures. That loop could still benefit from unrolling or byteArrayViewVarHandle, but I think you have a bigger problem in an app than raw performance if you have a lot of encoding failures...)
WDYT?
/Claes
18 feb. 2023 kl. 19:36 skrev Brett Okken <brett.okken.os at gmail.com<mailto:brett.okken.os at gmail.com>>:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L976-L981<https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java*L976-L981__;Iw!!ACWV5N9M2RV99hQ!O-1OoPjv-Jw1iGFv2-_q5zTiayoRyLJBcSiQ9J5IEun3kcusUeGRjZzOb_dploCjPh_Kjp6FJXr2cIk7Se5Z_Ook_Ws$>
For String.encodeASCII, with the LATIN1 coder is there any interest in
exploring the performance impacts of utilizing a
byteArrayViewVarHandle to read/write as longs and utilize a bitmask to
identify if negative values are present?
A simple jmh benchmark covering either 0 or 1 non-ascii (negative)
values shows times cut in half (or more) for most scenarios with
strings ranging in length from 3 - ~2000.
VM version: JDK 17.0.6, OpenJDK 64-Bit Server VM, 17.0.6+10
Windows 10 Intel(R) Core(TM) i7-9850H
Hand unrolling the loops shows noted improvement, but does make for
less aesthetically pleasing code.
Benchmark (nonascii) (size) Mode
Cnt Score Error Units
AsciiEncodeBenchmark.jdk none 3 avgt
4 15.531 ± 1.122 ns/op
AsciiEncodeBenchmark.jdk none 10 avgt
4 17.350 ± 0.473 ns/op
AsciiEncodeBenchmark.jdk none 16 avgt
4 18.277 ± 0.421 ns/op
AsciiEncodeBenchmark.jdk none 23 avgt
4 20.139 ± 0.350 ns/op
AsciiEncodeBenchmark.jdk none 33 avgt
4 22.008 ± 0.656 ns/op
AsciiEncodeBenchmark.jdk none 42 avgt
4 24.393 ± 1.155 ns/op
AsciiEncodeBenchmark.jdk none 201 avgt
4 55.884 ± 0.645 ns/op
AsciiEncodeBenchmark.jdk none 511 avgt
4 120.817 ± 2.917 ns/op
AsciiEncodeBenchmark.jdk none 2087 avgt
4 471.039 ± 13.329 ns/op
AsciiEncodeBenchmark.jdk first 3 avgt
4 15.794 ± 1.494 ns/op
AsciiEncodeBenchmark.jdk first 10 avgt
4 18.446 ± 0.780 ns/op
AsciiEncodeBenchmark.jdk first 16 avgt
4 20.458 ± 0.394 ns/op
AsciiEncodeBenchmark.jdk first 23 avgt
4 22.934 ± 0.422 ns/op
AsciiEncodeBenchmark.jdk first 33 avgt
4 25.367 ± 0.178 ns/op
AsciiEncodeBenchmark.jdk first 42 avgt
4 28.620 ± 0.678 ns/op
AsciiEncodeBenchmark.jdk first 201 avgt
4 80.250 ± 4.376 ns/op
AsciiEncodeBenchmark.jdk first 511 avgt
4 185.518 ± 6.370 ns/op
AsciiEncodeBenchmark.jdk first 2087 avgt
4 713.213 ± 13.488 ns/op
AsciiEncodeBenchmark.jdk last 3 avgt
4 14.991 ± 0.190 ns/op
AsciiEncodeBenchmark.jdk last 10 avgt
4 18.284 ± 0.317 ns/op
AsciiEncodeBenchmark.jdk last 16 avgt
4 20.591 ± 1.002 ns/op
AsciiEncodeBenchmark.jdk last 23 avgt
4 22.560 ± 0.963 ns/op
AsciiEncodeBenchmark.jdk last 33 avgt
4 25.521 ± 0.554 ns/op
AsciiEncodeBenchmark.jdk last 42 avgt
4 28.484 ± 0.446 ns/op
AsciiEncodeBenchmark.jdk last 201 avgt
4 79.434 ± 2.256 ns/op
AsciiEncodeBenchmark.jdk last 511 avgt
4 186.639 ± 4.258 ns/op
AsciiEncodeBenchmark.jdk last 2087 avgt
4 725.196 ± 149.416 ns/op
AsciiEncodeBenchmark.longCheckCopy none 3 avgt
4 7.222 ± 0.428 ns/op
AsciiEncodeBenchmark.longCheckCopy none 10 avgt
4 8.070 ± 0.171 ns/op
AsciiEncodeBenchmark.longCheckCopy none 16 avgt
4 6.711 ± 0.409 ns/op
AsciiEncodeBenchmark.longCheckCopy none 23 avgt
4 12.906 ± 3.633 ns/op
AsciiEncodeBenchmark.longCheckCopy none 33 avgt
4 10.414 ± 0.447 ns/op
AsciiEncodeBenchmark.longCheckCopy none 42 avgt
4 11.935 ± 1.235 ns/op
AsciiEncodeBenchmark.longCheckCopy none 201 avgt
4 29.538 ± 3.265 ns/op
AsciiEncodeBenchmark.longCheckCopy none 511 avgt
4 106.228 ± 68.475 ns/op
AsciiEncodeBenchmark.longCheckCopy none 2087 avgt
4 494.845 ± 890.985 ns/op
AsciiEncodeBenchmark.longCheckCopy first 3 avgt
4 7.775 ± 0.278 ns/op
AsciiEncodeBenchmark.longCheckCopy first 10 avgt
4 13.396 ± 2.072 ns/op
AsciiEncodeBenchmark.longCheckCopy first 16 avgt
4 13.528 ± 0.702 ns/op
AsciiEncodeBenchmark.longCheckCopy first 23 avgt
4 17.376 ± 0.360 ns/op
AsciiEncodeBenchmark.longCheckCopy first 33 avgt
4 16.251 ± 0.203 ns/op
AsciiEncodeBenchmark.longCheckCopy first 42 avgt
4 17.932 ± 1.773 ns/op
AsciiEncodeBenchmark.longCheckCopy first 201 avgt
4 39.028 ± 4.699 ns/op
AsciiEncodeBenchmark.longCheckCopy first 511 avgt
4 92.599 ± 11.078 ns/op
AsciiEncodeBenchmark.longCheckCopy first 2087 avgt
4 347.728 ± 7.837 ns/op
AsciiEncodeBenchmark.longCheckCopy last 3 avgt
4 7.472 ± 0.078 ns/op
AsciiEncodeBenchmark.longCheckCopy last 10 avgt
4 8.371 ± 0.815 ns/op
AsciiEncodeBenchmark.longCheckCopy last 16 avgt
4 6.766 ± 0.253 ns/op
AsciiEncodeBenchmark.longCheckCopy last 23 avgt
4 12.879 ± 0.454 ns/op
AsciiEncodeBenchmark.longCheckCopy last 33 avgt
4 10.491 ± 0.811 ns/op
AsciiEncodeBenchmark.longCheckCopy last 42 avgt
4 12.435 ± 1.212 ns/op
AsciiEncodeBenchmark.longCheckCopy last 201 avgt
4 28.507 ± 1.058 ns/op
AsciiEncodeBenchmark.longCheckCopy last 511 avgt
4 85.763 ± 1.941 ns/op
AsciiEncodeBenchmark.longCheckCopy last 2087 avgt
4 411.555 ± 3.595 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll none 3 avgt
4 5.858 ± 0.637 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll none 10 avgt
4 7.031 ± 0.274 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll none 16 avgt
4 6.768 ± 0.222 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll none 23 avgt
4 10.084 ± 0.102 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll none 33 avgt
4 9.876 ± 0.240 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll none 42 avgt
4 11.061 ± 0.590 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll none 201 avgt
4 29.264 ± 1.690 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll none 511 avgt
4 61.920 ± 5.482 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll none 2087 avgt
4 309.183 ± 42.354 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll first 3 avgt
4 5.687 ± 0.249 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll first 10 avgt
4 9.537 ± 0.337 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll first 16 avgt
4 9.928 ± 0.329 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll first 23 avgt
4 12.510 ± 0.519 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll first 33 avgt
4 13.028 ± 0.335 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll first 42 avgt
4 13.640 ± 0.219 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll first 201 avgt
4 31.046 ± 0.647 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll first 511 avgt
4 82.998 ± 5.611 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll first 2087 avgt
4 360.294 ± 8.419 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll last 3 avgt
4 5.657 ± 0.197 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll last 10 avgt
4 6.997 ± 0.081 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll last 16 avgt
4 6.890 ± 1.319 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll last 23 avgt
4 10.154 ± 0.389 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll last 33 avgt
4 9.986 ± 0.592 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll last 42 avgt
4 11.481 ± 0.375 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll last 201 avgt
4 29.286 ± 0.723 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll last 511 avgt
4 61.056 ± 0.977 ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll last 2087 avgt
4 303.415 ± 17.326 ns/op
@Benchmark
public byte[] jdk() {
final byte[] val = this.data;
byte[] dst = Arrays.copyOf(val, val.length);
for (int i = 0; i < dst.length; i++) {
if (dst[i] < 0) {
dst[i] = '?';
}
}
return dst;
}
@Benchmark
public byte[] longCheckCopy() {
final byte[] val = this.data;
byte[] dst = new byte[val.length];
int i = 0;
long word;
for (int j=dst.length - 7; i < j; i+=8) {
word = (long)LONG_BYTES.get(val, i);
LONG_BYTES.set(dst, i, word);
if ((word & LONG_NEG_MASK) != 0) {
for (int x=i, y=i+8; x<y; x++) {
if (dst[x] < 0) {
dst[x] = '?';
}
}
}
}
byte b;
for (; i < dst.length; i++) {
b = val[i];
dst[i] = b < 0 ? (byte) '?' : b;
}
return dst;
}
@Benchmark
public byte[] longCheckCopy_unroll() {
final byte[] val = this.data;
byte[] dst = new byte[val.length];
int i = 0;
long word;
for (int j=dst.length - 7; i < j; i+=8) {
word = (long)LONG_BYTES.get(val, i);
LONG_BYTES.set(dst, i, word);
if ((word & LONG_NEG_MASK) != 0) {
if (dst[i] < 0) {
dst[i] = '?';
}
if (dst[i + 1] < 0) {
dst[i + 1] = '?';
}
if (dst[i + 2] < 0) {
dst[i + 2] = '?';
}
if (dst[i + 3] < 0) {
dst[i + 3] = '?';
}
if (dst[i + 4] < 0) {
dst[i + 4] = '?';
}
if (dst[i + 5] < 0) {
dst[i + 5] = '?';
}
if (dst[i + 6] < 0) {
dst[i + 6] = '?';
}
if (dst[i + 7] < 0) {
dst[i + 7] = '?';
}
}
}
byte b;
switch (dst.length & 0x7) {
case 7:
b = val[i + 6];
dst[i + 6] = b < 0 ? (byte) '?' : b;
case 6:
b = val[i + 5];
dst[i + 5] = b < 0 ? (byte) '?' : b;
case 5:
b = val[i + 4];
dst[i + 4] = b < 0 ? (byte) '?' : b;
case 4:
b = val[i + 3];
dst[i + 3] = b < 0 ? (byte) '?' : b;
case 3:
b = val[i + 2];
dst[i + 2] = b < 0 ? (byte) '?' : b;
case 2:
b = val[i + 1];
dst[i + 1] = b < 0 ? (byte) '?' : b;
case 1:
b = val[i];
dst[i] = b < 0 ? (byte) '?' : b;
}
return dst;
}
Thanks,
Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230220/c7d5751f/attachment-0001.htm>
More information about the core-libs-dev
mailing list