JDK-8173585: Intrinsify StringLatin1.indexOf(char)
Andrew Haley
aph at redhat.com
Sat Sep 5 14:47:00 UTC 2020
On 03/09/2020 22:28, Tatton, Jason wrote:
>
> JMH Benchmark results:
> ====================
> The benchmarks examine the 3 codepaths for StringLatin1 and
> StringUTF16. Here are the results for Intel x86 (ARM is similar):
>
> FYI, String lengths in characters (1byte for Latin1, 2bytes for UTF16):
> Latin1 UTF16
> Short: 15 7
> SSE4: 16 8
> AVX2: 32 16
>
> Without StringLatin1 indexofchar intrinsic:
> Benchmark Mode Cnt Score Error Units
> IndexOfBenchmark.latin1_AVX2_String thrpt 5 121781.424 ± 355.085 ops/s
> IndexOfBenchmark.latin1_AVX2_char thrpt 5 46060.612 ± 151.274 ops/s
> IndexOfBenchmark.latin1_SSE4_String thrpt 5 197339.146 ± 90.333 ops/s
> IndexOfBenchmark.latin1_SSE4_char thrpt 5 61401.204 ± 426.761 ops/s
> IndexOfBenchmark.latin1_Short_String thrpt 5 175389.355 ± 294.976 ops/s
> IndexOfBenchmark.latin1_Short_char thrpt 5 60759.868 ± 124.349 ops/s
> IndexOfBenchmark.utf16_AVX2_String thrpt 5 123601.020 ± 111.981 ops/s
> IndexOfBenchmark.utf16_AVX2_char thrpt 5 141116.832 ± 380.489 ops/s
> IndexOfBenchmark.utf16_SSE4_String thrpt 5 178136.762 ± 143.227 ops/s
> IndexOfBenchmark.utf16_SSE4_char thrpt 5 181430.649 ± 120.097 ops/s
> IndexOfBenchmark.utf16_Short_String thrpt 5 158301.361 ± 182.738 ops/s
> IndexOfBenchmark.utf16_Short_char thrpt 5 84876.919 ± 247.769 ops/s
>
> With StringLatin1 indexofchar intrinsic:
> Benchmark Mode Cnt Score Error Units
> IndexOfBenchmark.latin1_AVX2_String thrpt 5 113621.676 ± 68.235 ops/s
> IndexOfBenchmark.latin1_AVX2_char thrpt 5 177757.909 ± 727.308 ops/s
> IndexOfBenchmark.latin1_SSE4_String thrpt 5 180529.049 ± 57.356 ops/s
> IndexOfBenchmark.latin1_SSE4_char thrpt 5 235087.776 ± 457.024 ops/s
> IndexOfBenchmark.latin1_Short_String thrpt 5 165914.990 ± 329.024 ops/s
> IndexOfBenchmark.latin1_Short_char thrpt 5 53989.544 ± 65.393 ops/s
> IndexOfBenchmark.utf16_AVX2_String thrpt 5 107632.783 ± 446.272 ops/s
> IndexOfBenchmark.utf16_AVX2_char thrpt 5 143131.734 ± 159.944 ops/s
> IndexOfBenchmark.utf16_SSE4_String thrpt 5 169882.703 ± 1024.367 ops/s
> IndexOfBenchmark.utf16_SSE4_char thrpt 5 175693.972 ± 775.423 ops/s
> IndexOfBenchmark.utf16_Short_String thrpt 5 163595.993 ± 225.089 ops/s
> IndexOfBenchmark.utf16_Short_char thrpt 5 90126.154 ± 365.642 ops/s
>
> We can see above that indexOf(char) now behaves similarly between
> StringUTF16 and StringLatin1.
This is confusing. Can you please make the times nanoseconds? It's
quite a struggle trying to think in reciprocal units for these very
low-level benchmarks. Maybe it's just me.
There are 1000 strings of length 32 bytes, so I guess that makes
everything fit in L1, just. I guess that was the idea?
> //'a is never present in rnd string
So you only benchmarks searches that always fail? I don't get that at
all.
I'd also vary string lengths. 32 characters is a good average, so you
should have a decent spread of different lengths, average over the
whole set 32. I'd place a terminating character randomly in *at least*
50% of the strings.
I think that would be much more representative.
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the hotspot-compiler-dev
mailing list