JDK-8173585: Intrinsify StringLatin1.indexOf(char)
Tatton, Jason
jptatton at amazon.com
Tue Sep 8 12:02:14 UTC 2020
Hi Andrew, thank you for taking the time to review this.
Since we have now moved to git, I have raised a new PR for this RFR:
https://github.com/openjdk/jdk/pull/71
https://bugs.openjdk.java.net/browse/JDK-8173585
I have improved the micro benchmark in the ways which you and others have requested, namely:
+ The benchmark is now included in test/micro/org/openjdk/bench/java/lang as StringIndexOfChar (as advised by my colleagues here at AWS; Xin Liu and Volker Simonis).
+ Times are now in nanoseconds.
+ Terminating characters ('a') are in 66.666% of tested strings.
+ I have added four new benchmarks which operate on a random length strings (32 characters being the average) of type either StringLatin1 of StringUTF16 and call indexOf(char) or indexOf(String).
I have included below the output of these four tests below:
Without the new StringLatin1 indexOf(char) intrinsic:
Benchmark Mode Cnt Score Error Units
IndexOfBenchmark.latin1_mixed_char avgt 5 26389.129 ± 182.581 ns/op
IndexOfBenchmark.utf16_mixed_char avgt 5 17885.383 ± 435.933 ns/op
With the new StringLatin1 indexOf(char) intrinsic:
Benchmark Mode Cnt Score Error Units
IndexOfBenchmark.latin1_mixed_char avgt 5 17875.185 ± 407.716 ns/op
IndexOfBenchmark.utf16_mixed_char avgt 5 18292.802 ± 167.306 ns/op
The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when running on ARM.
Regards,
Jason
-----Original Message-----
From: Andrew Haley <aph at redhat.com>
Sent: 05 September 2020 15:47
To: Tatton, Jason <jptatton at amazon.com>; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net
Subject: RE: [EXTERNAL] JDK-8173585: Intrinsify StringLatin1.indexOf(char)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On 03/09/2020 22:28, Tatton, Jason wrote:
>
> JMH Benchmark results:
> ====================
> The benchmarks examine the 3 codepaths for StringLatin1 and
> StringUTF16. Here are the results for Intel x86 (ARM is similar):
>
> FYI, String lengths in characters (1byte for Latin1, 2bytes for UTF16):
> Latin1 UTF16
> Short: 15 7
> SSE4: 16 8
> AVX2: 32 16
>
> Without StringLatin1 indexofchar intrinsic:
> Benchmark Mode Cnt Score Error Units
> IndexOfBenchmark.latin1_AVX2_String thrpt 5 121781.424 ± 355.085 ops/s
> IndexOfBenchmark.latin1_AVX2_char thrpt 5 46060.612 ± 151.274 ops/s
> IndexOfBenchmark.latin1_SSE4_String thrpt 5 197339.146 ± 90.333 ops/s
> IndexOfBenchmark.latin1_SSE4_char thrpt 5 61401.204 ± 426.761 ops/s
> IndexOfBenchmark.latin1_Short_String thrpt 5 175389.355 ± 294.976 ops/s
> IndexOfBenchmark.latin1_Short_char thrpt 5 60759.868 ± 124.349 ops/s
> IndexOfBenchmark.utf16_AVX2_String thrpt 5 123601.020 ± 111.981 ops/s
> IndexOfBenchmark.utf16_AVX2_char thrpt 5 141116.832 ± 380.489 ops/s
> IndexOfBenchmark.utf16_SSE4_String thrpt 5 178136.762 ± 143.227 ops/s
> IndexOfBenchmark.utf16_SSE4_char thrpt 5 181430.649 ± 120.097 ops/s
> IndexOfBenchmark.utf16_Short_String thrpt 5 158301.361 ± 182.738 ops/s
> IndexOfBenchmark.utf16_Short_char thrpt 5 84876.919 ± 247.769 ops/s
>
> With StringLatin1 indexofchar intrinsic:
> Benchmark Mode Cnt Score Error Units
> IndexOfBenchmark.latin1_AVX2_String thrpt 5 113621.676 ± 68.235 ops/s
> IndexOfBenchmark.latin1_AVX2_char thrpt 5 177757.909 ± 727.308 ops/s
> IndexOfBenchmark.latin1_SSE4_String thrpt 5 180529.049 ± 57.356 ops/s
> IndexOfBenchmark.latin1_SSE4_char thrpt 5 235087.776 ± 457.024 ops/s
> IndexOfBenchmark.latin1_Short_String thrpt 5 165914.990 ± 329.024 ops/s
> IndexOfBenchmark.latin1_Short_char thrpt 5 53989.544 ± 65.393 ops/s
> IndexOfBenchmark.utf16_AVX2_String thrpt 5 107632.783 ± 446.272 ops/s
> IndexOfBenchmark.utf16_AVX2_char thrpt 5 143131.734 ± 159.944 ops/s
> IndexOfBenchmark.utf16_SSE4_String thrpt 5 169882.703 ± 1024.367 ops/s
> IndexOfBenchmark.utf16_SSE4_char thrpt 5 175693.972 ± 775.423 ops/s
> IndexOfBenchmark.utf16_Short_String thrpt 5 163595.993 ± 225.089 ops/s
> IndexOfBenchmark.utf16_Short_char thrpt 5 90126.154 ± 365.642 ops/s
>
> We can see above that indexOf(char) now behaves similarly between
> StringUTF16 and StringLatin1.
This is confusing. Can you please make the times nanoseconds? It's quite a struggle trying to think in reciprocal units for these very low-level benchmarks. Maybe it's just me.
There are 1000 strings of length 32 bytes, so I guess that makes everything fit in L1, just. I guess that was the idea?
> //'a is never present in rnd string
So you only benchmarks searches that always fail? I don't get that at all.
I'd also vary string lengths. 32 characters is a good average, so you should have a decent spread of different lengths, average over the whole set 32. I'd place a terminating character randomly in *at least* 50% of the strings.
I think that would be much more representative.
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the hotspot-compiler-dev
mailing list