String encodeASCII
Brett Okken
brett.okken.os at gmail.com
Sun Feb 19 01:54:31 UTC 2023
Thanks for taking a look at this.
That is a pretty good improvement with a much smaller change.
I recognize the intricacies of bootstrapping, but have no expertise. Would
using Unsafe directly be easier? Particularly for shorter strings, doing
the copying and checking together rather than as separate loops seems to
speed things up considerably, even for happy-path of no failures.
Brett
On Sat, Feb 18, 2023 at 5:49 PM Claes Redestad <claes.redestad at oracle.com>
wrote:
> Hi,
>
> general comment: String might be one of the trickier places to add a
> VarHandle dependency, since String is used very early in the bootstrap and
> depended upon by everything else, so I’d expect such a solution would have
> to navigate potential circularity issues carefully. It’d be good to
> experiment with changes to java.lang.String proper to see that the solution
> that works nice externally is or can be made feasible within String.
>
> Specifically on the performance opportunity then while US-ASCII encoding
> is probably on the way out we shouldn’t neglect it.
>
> One way to go about this without pulling VarHandles into String might be
> to use what other encode methods in String does and leverage
> StringCoding.countPositives:
>
> https://github.com/openjdk/jdk/pull/12640
>
> Testing this on the existing StringEncode microbenchmark, shows a
> promising speed-up when the input is ASCII-encodable:
>
> Baseline
> Benchmark (charsetName) Mode Cnt Score
> Error Units
> StringEncode.encodeAsciiLong US-ASCII avgt 5 26626,025 ±
> 448,307 ns/op
> StringEncode.encodeAsciiShort US-ASCII avgt 5 33,336 ±
> 2,032 ns/op
>
> Patch:
> Benchmark (charsetName) Mode Cnt Score Error
> Units
> StringEncode.encodeAsciiLong US-ASCII avgt 5 5492,985 ± 40,066
> ns/op
> StringEncode.encodeAsciiShort US-ASCII avgt 5 28,545 ± 4,883
> ns/op
>
> (You might see that this will go back into a scalar loop on encoding
> failures. That loop could still benefit from unrolling or
> byteArrayViewVarHandle, but I think you have a bigger problem in an app
> than raw performance if you have a lot of encoding failures...)
>
> WDYT?
>
> /Claes
>
> 18 feb. 2023 kl. 19:36 skrev Brett Okken <brett.okken.os at gmail.com>:
>
>
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L976-L981
>
> For String.encodeASCII, with the LATIN1 coder is there any interest in
> exploring the performance impacts of utilizing a
> byteArrayViewVarHandle to read/write as longs and utilize a bitmask to
> identify if negative values are present?
>
> A simple jmh benchmark covering either 0 or 1 non-ascii (negative)
> values shows times cut in half (or more) for most scenarios with
> strings ranging in length from 3 - ~2000.
> VM version: JDK 17.0.6, OpenJDK 64-Bit Server VM, 17.0.6+10
> Windows 10 Intel(R) Core(TM) i7-9850H
>
> Hand unrolling the loops shows noted improvement, but does make for
> less aesthetically pleasing code.
>
>
> Benchmark (nonascii) (size) Mode
> Cnt Score Error Units
> AsciiEncodeBenchmark.jdk none 3 avgt
> 4 15.531 ± 1.122 ns/op
> AsciiEncodeBenchmark.jdk none 10 avgt
> 4 17.350 ± 0.473 ns/op
> AsciiEncodeBenchmark.jdk none 16 avgt
> 4 18.277 ± 0.421 ns/op
> AsciiEncodeBenchmark.jdk none 23 avgt
> 4 20.139 ± 0.350 ns/op
> AsciiEncodeBenchmark.jdk none 33 avgt
> 4 22.008 ± 0.656 ns/op
> AsciiEncodeBenchmark.jdk none 42 avgt
> 4 24.393 ± 1.155 ns/op
> AsciiEncodeBenchmark.jdk none 201 avgt
> 4 55.884 ± 0.645 ns/op
> AsciiEncodeBenchmark.jdk none 511 avgt
> 4 120.817 ± 2.917 ns/op
> AsciiEncodeBenchmark.jdk none 2087 avgt
> 4 471.039 ± 13.329 ns/op
> AsciiEncodeBenchmark.jdk first 3 avgt
> 4 15.794 ± 1.494 ns/op
> AsciiEncodeBenchmark.jdk first 10 avgt
> 4 18.446 ± 0.780 ns/op
> AsciiEncodeBenchmark.jdk first 16 avgt
> 4 20.458 ± 0.394 ns/op
> AsciiEncodeBenchmark.jdk first 23 avgt
> 4 22.934 ± 0.422 ns/op
> AsciiEncodeBenchmark.jdk first 33 avgt
> 4 25.367 ± 0.178 ns/op
> AsciiEncodeBenchmark.jdk first 42 avgt
> 4 28.620 ± 0.678 ns/op
> AsciiEncodeBenchmark.jdk first 201 avgt
> 4 80.250 ± 4.376 ns/op
> AsciiEncodeBenchmark.jdk first 511 avgt
> 4 185.518 ± 6.370 ns/op
> AsciiEncodeBenchmark.jdk first 2087 avgt
> 4 713.213 ± 13.488 ns/op
> AsciiEncodeBenchmark.jdk last 3 avgt
> 4 14.991 ± 0.190 ns/op
> AsciiEncodeBenchmark.jdk last 10 avgt
> 4 18.284 ± 0.317 ns/op
> AsciiEncodeBenchmark.jdk last 16 avgt
> 4 20.591 ± 1.002 ns/op
> AsciiEncodeBenchmark.jdk last 23 avgt
> 4 22.560 ± 0.963 ns/op
> AsciiEncodeBenchmark.jdk last 33 avgt
> 4 25.521 ± 0.554 ns/op
> AsciiEncodeBenchmark.jdk last 42 avgt
> 4 28.484 ± 0.446 ns/op
> AsciiEncodeBenchmark.jdk last 201 avgt
> 4 79.434 ± 2.256 ns/op
> AsciiEncodeBenchmark.jdk last 511 avgt
> 4 186.639 ± 4.258 ns/op
> AsciiEncodeBenchmark.jdk last 2087 avgt
> 4 725.196 ± 149.416 ns/op
> AsciiEncodeBenchmark.longCheckCopy none 3 avgt
> 4 7.222 ± 0.428 ns/op
> AsciiEncodeBenchmark.longCheckCopy none 10 avgt
> 4 8.070 ± 0.171 ns/op
> AsciiEncodeBenchmark.longCheckCopy none 16 avgt
> 4 6.711 ± 0.409 ns/op
> AsciiEncodeBenchmark.longCheckCopy none 23 avgt
> 4 12.906 ± 3.633 ns/op
> AsciiEncodeBenchmark.longCheckCopy none 33 avgt
> 4 10.414 ± 0.447 ns/op
> AsciiEncodeBenchmark.longCheckCopy none 42 avgt
> 4 11.935 ± 1.235 ns/op
> AsciiEncodeBenchmark.longCheckCopy none 201 avgt
> 4 29.538 ± 3.265 ns/op
> AsciiEncodeBenchmark.longCheckCopy none 511 avgt
> 4 106.228 ± 68.475 ns/op
> AsciiEncodeBenchmark.longCheckCopy none 2087 avgt
> 4 494.845 ± 890.985 ns/op
> AsciiEncodeBenchmark.longCheckCopy first 3 avgt
> 4 7.775 ± 0.278 ns/op
> AsciiEncodeBenchmark.longCheckCopy first 10 avgt
> 4 13.396 ± 2.072 ns/op
> AsciiEncodeBenchmark.longCheckCopy first 16 avgt
> 4 13.528 ± 0.702 ns/op
> AsciiEncodeBenchmark.longCheckCopy first 23 avgt
> 4 17.376 ± 0.360 ns/op
> AsciiEncodeBenchmark.longCheckCopy first 33 avgt
> 4 16.251 ± 0.203 ns/op
> AsciiEncodeBenchmark.longCheckCopy first 42 avgt
> 4 17.932 ± 1.773 ns/op
> AsciiEncodeBenchmark.longCheckCopy first 201 avgt
> 4 39.028 ± 4.699 ns/op
> AsciiEncodeBenchmark.longCheckCopy first 511 avgt
> 4 92.599 ± 11.078 ns/op
> AsciiEncodeBenchmark.longCheckCopy first 2087 avgt
> 4 347.728 ± 7.837 ns/op
> AsciiEncodeBenchmark.longCheckCopy last 3 avgt
> 4 7.472 ± 0.078 ns/op
> AsciiEncodeBenchmark.longCheckCopy last 10 avgt
> 4 8.371 ± 0.815 ns/op
> AsciiEncodeBenchmark.longCheckCopy last 16 avgt
> 4 6.766 ± 0.253 ns/op
> AsciiEncodeBenchmark.longCheckCopy last 23 avgt
> 4 12.879 ± 0.454 ns/op
> AsciiEncodeBenchmark.longCheckCopy last 33 avgt
> 4 10.491 ± 0.811 ns/op
> AsciiEncodeBenchmark.longCheckCopy last 42 avgt
> 4 12.435 ± 1.212 ns/op
> AsciiEncodeBenchmark.longCheckCopy last 201 avgt
> 4 28.507 ± 1.058 ns/op
> AsciiEncodeBenchmark.longCheckCopy last 511 avgt
> 4 85.763 ± 1.941 ns/op
> AsciiEncodeBenchmark.longCheckCopy last 2087 avgt
> 4 411.555 ± 3.595 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll none 3 avgt
> 4 5.858 ± 0.637 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll none 10 avgt
> 4 7.031 ± 0.274 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll none 16 avgt
> 4 6.768 ± 0.222 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll none 23 avgt
> 4 10.084 ± 0.102 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll none 33 avgt
> 4 9.876 ± 0.240 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll none 42 avgt
> 4 11.061 ± 0.590 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll none 201 avgt
> 4 29.264 ± 1.690 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll none 511 avgt
> 4 61.920 ± 5.482 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll none 2087 avgt
> 4 309.183 ± 42.354 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll first 3 avgt
> 4 5.687 ± 0.249 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll first 10 avgt
> 4 9.537 ± 0.337 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll first 16 avgt
> 4 9.928 ± 0.329 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll first 23 avgt
> 4 12.510 ± 0.519 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll first 33 avgt
> 4 13.028 ± 0.335 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll first 42 avgt
> 4 13.640 ± 0.219 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll first 201 avgt
> 4 31.046 ± 0.647 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll first 511 avgt
> 4 82.998 ± 5.611 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll first 2087 avgt
> 4 360.294 ± 8.419 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll last 3 avgt
> 4 5.657 ± 0.197 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll last 10 avgt
> 4 6.997 ± 0.081 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll last 16 avgt
> 4 6.890 ± 1.319 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll last 23 avgt
> 4 10.154 ± 0.389 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll last 33 avgt
> 4 9.986 ± 0.592 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll last 42 avgt
> 4 11.481 ± 0.375 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll last 201 avgt
> 4 29.286 ± 0.723 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll last 511 avgt
> 4 61.056 ± 0.977 ns/op
> AsciiEncodeBenchmark.longCheckCopy_unroll last 2087 avgt
> 4 303.415 ± 17.326 ns/op
>
>
>
> @Benchmark
> public byte[] jdk() {
> final byte[] val = this.data;
> byte[] dst = Arrays.copyOf(val, val.length);
> for (int i = 0; i < dst.length; i++) {
> if (dst[i] < 0) {
> dst[i] = '?';
> }
> }
> return dst;
> }
>
> @Benchmark
> public byte[] longCheckCopy() {
> final byte[] val = this.data;
> byte[] dst = new byte[val.length];
> int i = 0;
> long word;
> for (int j=dst.length - 7; i < j; i+=8) {
> word = (long)LONG_BYTES.get(val, i);
> LONG_BYTES.set(dst, i, word);
> if ((word & LONG_NEG_MASK) != 0) {
> for (int x=i, y=i+8; x<y; x++) {
> if (dst[x] < 0) {
> dst[x] = '?';
> }
> }
> }
> }
> byte b;
> for (; i < dst.length; i++) {
> b = val[i];
> dst[i] = b < 0 ? (byte) '?' : b;
> }
> return dst;
> }
>
> @Benchmark
> public byte[] longCheckCopy_unroll() {
> final byte[] val = this.data;
> byte[] dst = new byte[val.length];
> int i = 0;
> long word;
> for (int j=dst.length - 7; i < j; i+=8) {
> word = (long)LONG_BYTES.get(val, i);
> LONG_BYTES.set(dst, i, word);
> if ((word & LONG_NEG_MASK) != 0) {
> if (dst[i] < 0) {
> dst[i] = '?';
> }
> if (dst[i + 1] < 0) {
> dst[i + 1] = '?';
> }
> if (dst[i + 2] < 0) {
> dst[i + 2] = '?';
> }
> if (dst[i + 3] < 0) {
> dst[i + 3] = '?';
> }
> if (dst[i + 4] < 0) {
> dst[i + 4] = '?';
> }
> if (dst[i + 5] < 0) {
> dst[i + 5] = '?';
> }
> if (dst[i + 6] < 0) {
> dst[i + 6] = '?';
> }
> if (dst[i + 7] < 0) {
> dst[i + 7] = '?';
> }
> }
> }
> byte b;
> switch (dst.length & 0x7) {
> case 7:
> b = val[i + 6];
> dst[i + 6] = b < 0 ? (byte) '?' : b;
> case 6:
> b = val[i + 5];
> dst[i + 5] = b < 0 ? (byte) '?' : b;
> case 5:
> b = val[i + 4];
> dst[i + 4] = b < 0 ? (byte) '?' : b;
> case 4:
> b = val[i + 3];
> dst[i + 3] = b < 0 ? (byte) '?' : b;
> case 3:
> b = val[i + 2];
> dst[i + 2] = b < 0 ? (byte) '?' : b;
> case 2:
> b = val[i + 1];
> dst[i + 1] = b < 0 ? (byte) '?' : b;
> case 1:
> b = val[i];
> dst[i] = b < 0 ? (byte) '?' : b;
> }
> return dst;
> }
>
>
> Thanks,
>
> Brett
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230218/068598cf/attachment-0001.htm>
More information about the core-libs-dev
mailing list