JDK-8173585: Intrinsify StringLatin1.indexOf(char)

Tatton, Jason jptatton at amazon.com
Tue Sep 8 12:02:14 UTC 2020


Hi Andrew, thank you for taking the time to review this.

Since we have now moved to git, I have raised a new PR for this RFR:

https://github.com/openjdk/jdk/pull/71
https://bugs.openjdk.java.net/browse/JDK-8173585

I have improved the micro benchmark in the ways which you and others have requested, namely:
+ The benchmark is now included in test/micro/org/openjdk/bench/java/lang as StringIndexOfChar (as advised by my colleagues here at AWS; Xin Liu and Volker Simonis).
+ Times are now in nanoseconds.
+ Terminating characters ('a') are in 66.666% of tested strings.
+ I have added four new benchmarks which operate on a random length strings (32 characters being the average) of type either StringLatin1 of StringUTF16 and call indexOf(char) or indexOf(String).

I have included below the output of these four tests below:

Without the new StringLatin1 indexOf(char) intrinsic:

Benchmark                           Mode  Cnt      Score     Error  Units
IndexOfBenchmark.latin1_mixed_char  avgt    5  26389.129 ± 182.581  ns/op
IndexOfBenchmark.utf16_mixed_char   avgt    5  17885.383 ± 435.933  ns/op


With the new StringLatin1 indexOf(char) intrinsic:

Benchmark                           Mode  Cnt      Score     Error  Units
IndexOfBenchmark.latin1_mixed_char  avgt    5  17875.185 ± 407.716  ns/op
IndexOfBenchmark.utf16_mixed_char   avgt    5  18292.802 ± 167.306  ns/op

The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when running on ARM.

Regards,
Jason

-----Original Message-----
From: Andrew Haley <aph at redhat.com> 
Sent: 05 September 2020 15:47
To: Tatton, Jason <jptatton at amazon.com>; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net
Subject: RE: [EXTERNAL] JDK-8173585: Intrinsify StringLatin1.indexOf(char)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



On 03/09/2020 22:28, Tatton, Jason wrote:

>
> JMH Benchmark results:
> ====================
> The benchmarks examine the 3 codepaths for StringLatin1 and 
> StringUTF16. Here are the results for Intel x86 (ARM is similar):
>
> FYI, String lengths in characters (1byte for Latin1, 2bytes for UTF16):
>        Latin1  UTF16
> Short: 15       7
> SSE4:  16       8
> AVX2:  32       16
>
> Without StringLatin1 indexofchar intrinsic:
> Benchmark                              Mode  Cnt       Score      Error  Units
> IndexOfBenchmark.latin1_AVX2_String   thrpt    5  121781.424 ± 355.085  ops/s
> IndexOfBenchmark.latin1_AVX2_char     thrpt    5   46060.612 ± 151.274  ops/s
> IndexOfBenchmark.latin1_SSE4_String   thrpt    5  197339.146 ±  90.333  ops/s
> IndexOfBenchmark.latin1_SSE4_char     thrpt    5   61401.204 ± 426.761  ops/s
> IndexOfBenchmark.latin1_Short_String  thrpt    5  175389.355 ± 294.976  ops/s
> IndexOfBenchmark.latin1_Short_char    thrpt    5   60759.868 ± 124.349  ops/s
> IndexOfBenchmark.utf16_AVX2_String    thrpt    5  123601.020 ± 111.981  ops/s
> IndexOfBenchmark.utf16_AVX2_char      thrpt    5  141116.832 ± 380.489  ops/s
> IndexOfBenchmark.utf16_SSE4_String    thrpt    5  178136.762 ± 143.227  ops/s
> IndexOfBenchmark.utf16_SSE4_char      thrpt    5  181430.649 ± 120.097  ops/s
> IndexOfBenchmark.utf16_Short_String   thrpt    5  158301.361 ± 182.738  ops/s
> IndexOfBenchmark.utf16_Short_char     thrpt    5   84876.919 ± 247.769  ops/s
>
> With StringLatin1 indexofchar intrinsic:
> Benchmark                              Mode  Cnt       Score      Error  Units
> IndexOfBenchmark.latin1_AVX2_String   thrpt    5  113621.676 ±   68.235  ops/s
> IndexOfBenchmark.latin1_AVX2_char     thrpt    5  177757.909 ±  727.308  ops/s
> IndexOfBenchmark.latin1_SSE4_String   thrpt    5  180529.049 ±   57.356  ops/s
> IndexOfBenchmark.latin1_SSE4_char     thrpt    5  235087.776 ±  457.024  ops/s
> IndexOfBenchmark.latin1_Short_String  thrpt    5  165914.990 ±  329.024  ops/s
> IndexOfBenchmark.latin1_Short_char    thrpt    5   53989.544 ±   65.393  ops/s
> IndexOfBenchmark.utf16_AVX2_String    thrpt    5  107632.783 ±  446.272  ops/s
> IndexOfBenchmark.utf16_AVX2_char      thrpt    5  143131.734 ±  159.944  ops/s
> IndexOfBenchmark.utf16_SSE4_String    thrpt    5  169882.703 ± 1024.367  ops/s
> IndexOfBenchmark.utf16_SSE4_char      thrpt    5  175693.972 ±  775.423  ops/s
> IndexOfBenchmark.utf16_Short_String   thrpt    5  163595.993 ±  225.089  ops/s
> IndexOfBenchmark.utf16_Short_char     thrpt    5   90126.154 ±  365.642  ops/s
>
> We can see above that indexOf(char) now behaves similarly between
> StringUTF16 and StringLatin1.

This is confusing. Can you please make the times nanoseconds?  It's quite a struggle trying to think in reciprocal units for these very low-level benchmarks. Maybe it's just me.

There are 1000 strings of length 32 bytes, so I guess that makes everything fit in L1, just. I guess that was the idea?

>            //'a is never present in rnd string

So you only benchmarks searches that always fail? I don't get that at all.

I'd also vary string lengths. 32 characters is a good average, so you should have a decent spread of different lengths, average over the whole set 32. I'd place a terminating character randomly in *at least* 50% of the strings.

I think that would be much more representative.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



More information about the core-libs-dev mailing list