RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]

Wu Yan wuyan at openjdk.java.net
Thu Aug 26 09:29:31 UTC 2021


On Wed, 25 Aug 2021 07:40:56 GMT, Nick Gasson <ngasson at openjdk.org> wrote:

> I've run the benchmark on several different machines and didn't see any performance regressions, and the speed-up for longer strings looks quite good. I also ran jtreg tier1-3 with no new failures so I think this is ok.
> 
> If you fix the Windows build I'll approve it. But please wait for another review, preferably from @theRealAph.

OK, Thank you very much!


> Note that JDK-8269559 (#5129) is also adding a JMH benchmark for this intrinsic: it would be good if we could merge them, either now or later.

The JMH benchmark added by JDK-8269559 (#5129) can cover our test items (compareToLL and compareToUU), and can show the improvement of our patch, so we decided to delete our JMH benchmark in the next commit.
The test results using that JMH benchmark in JDK-8269559 are as follows:

Raspberry Pi 4B
base:
Benchmark                                   (delta)  (size)  Mode  Cnt   Score   Error  Units
StringCompareToDifferentLength.compareToLL        2      24  avgt    3   2.310 ? 0.050  ms/op
StringCompareToDifferentLength.compareToLL        2      36  avgt    3   2.818 ? 0.185  ms/op
StringCompareToDifferentLength.compareToLL        2      72  avgt    3   3.151 ? 0.215  ms/op
StringCompareToDifferentLength.compareToLL        2     128  avgt    3   4.171 ? 1.320  ms/op
StringCompareToDifferentLength.compareToLL        2     256  avgt    3   6.169 ? 0.653  ms/op
StringCompareToDifferentLength.compareToLL        2     512  avgt    3  10.911 ? 0.175  ms/op
StringCompareToDifferentLength.compareToLU        2      24  avgt    3   3.312 ? 0.102  ms/op
StringCompareToDifferentLength.compareToLU        2      36  avgt    3   4.162 ? 0.032  ms/op
StringCompareToDifferentLength.compareToLU        2      72  avgt    3   5.705 ? 0.152  ms/op
StringCompareToDifferentLength.compareToLU        2     128  avgt    3   9.301 ? 0.749  ms/op
StringCompareToDifferentLength.compareToLU        2     256  avgt    3  16.507 ? 1.353  ms/op
StringCompareToDifferentLength.compareToLU        2     512  avgt    3  30.160 ? 0.377  ms/op
StringCompareToDifferentLength.compareToUL        2      24  avgt    3   3.366 ? 0.280  ms/op
StringCompareToDifferentLength.compareToUL        2      36  avgt    3   4.308 ? 0.037  ms/op
StringCompareToDifferentLength.compareToUL        2      72  avgt    3   5.674 ? 0.210  ms/op
StringCompareToDifferentLength.compareToUL        2     128  avgt    3   9.358 ? 0.158  ms/op
StringCompareToDifferentLength.compareToUL        2     256  avgt    3  16.165 ? 0.158  ms/op
StringCompareToDifferentLength.compareToUL        2     512  avgt    3  29.857 ? 0.277  ms/op
StringCompareToDifferentLength.compareToUU        2      24  avgt    3   3.149 ? 0.209  ms/op
StringCompareToDifferentLength.compareToUU        2      36  avgt    3   3.157 ? 0.102  ms/op
StringCompareToDifferentLength.compareToUU        2      72  avgt    3   4.415 ? 0.073  ms/op
StringCompareToDifferentLength.compareToUU        2     128  avgt    3   6.244 ? 0.224  ms/op
StringCompareToDifferentLength.compareToUU        2     256  avgt    3  11.032 ? 0.080  ms/op
StringCompareToDifferentLength.compareToUU        2     512  avgt    3  20.942 ? 3.973  ms/op

opt:
Benchmark                                   (delta)  (size)  Mode  Cnt   Score   Error  Units
StringCompareToDifferentLength.compareToLL        2      24  avgt    3   2.319 ? 0.121  ms/op
StringCompareToDifferentLength.compareToLL        2      36  avgt    3   2.820 ? 0.096  ms/op
StringCompareToDifferentLength.compareToLL        2      72  avgt    3   2.511 ? 0.024  ms/op
StringCompareToDifferentLength.compareToLL        2     128  avgt    3   3.496 ? 0.382  ms/op
StringCompareToDifferentLength.compareToLL        2     256  avgt    3   5.215 ? 0.210  ms/op
StringCompareToDifferentLength.compareToLL        2     512  avgt    3   7.772 ? 0.448  ms/op
StringCompareToDifferentLength.compareToLU        2      24  avgt    3   3.432 ? 0.249  ms/op
StringCompareToDifferentLength.compareToLU        2      36  avgt    3   4.156 ? 0.052  ms/op
StringCompareToDifferentLength.compareToLU        2      72  avgt    3   5.735 ? 0.043  ms/op
StringCompareToDifferentLength.compareToLU        2     128  avgt    3   9.215 ? 0.394  ms/op
StringCompareToDifferentLength.compareToLU        2     256  avgt    3  16.373 ? 0.515  ms/op
StringCompareToDifferentLength.compareToLU        2     512  avgt    3  29.906 ? 0.245  ms/op
StringCompareToDifferentLength.compareToUL        2      24  avgt    3   3.361 ? 0.116  ms/op
StringCompareToDifferentLength.compareToUL        2      36  avgt    3   4.253 ? 0.061  ms/op
StringCompareToDifferentLength.compareToUL        2      72  avgt    3   5.744 ? 0.082  ms/op
StringCompareToDifferentLength.compareToUL        2     128  avgt    3   9.167 ? 0.343  ms/op
StringCompareToDifferentLength.compareToUL        2     256  avgt    3  16.591 ? 0.999  ms/op
StringCompareToDifferentLength.compareToUL        2     512  avgt    3  30.232 ? 2.057  ms/op
StringCompareToDifferentLength.compareToUU        2      24  avgt    3   3.147 ? 0.057  ms/op
StringCompareToDifferentLength.compareToUU        2      36  avgt    3   2.526 ? 0.027  ms/op
StringCompareToDifferentLength.compareToUU        2      72  avgt    3   3.832 ? 0.228  ms/op
StringCompareToDifferentLength.compareToUU        2     128  avgt    3   5.332 ? 0.173  ms/op
StringCompareToDifferentLength.compareToUU        2     256  avgt    3   8.417 ? 0.551  ms/op
StringCompareToDifferentLength.compareToUU        2     512  avgt    3  14.903 ? 0.782  ms/op

Hisilicon
base:
Benchmark                                   (delta)  (size)  Mode  Cnt   Score   Error  Units
StringCompareToDifferentLength.compareToLL        2      24  avgt   30   0.824 ? 0.003  ms/op
StringCompareToDifferentLength.compareToLL        2      36  avgt   30   1.123 ? 0.050  ms/op
StringCompareToDifferentLength.compareToLL        2      72  avgt   30   1.550 ? 0.052  ms/op
StringCompareToDifferentLength.compareToLL        2     128  avgt   30   2.015 ? 0.040  ms/op
StringCompareToDifferentLength.compareToLL        2     256  avgt   30   3.154 ? 0.032  ms/op
StringCompareToDifferentLength.compareToLL        2     512  avgt   30   5.519 ? 0.044  ms/op
StringCompareToDifferentLength.compareToLU        2      24  avgt   30   1.469 ? 0.196  ms/op
StringCompareToDifferentLength.compareToLU        2      36  avgt   30   1.777 ? 0.097  ms/op
StringCompareToDifferentLength.compareToLU        2      72  avgt   30   2.509 ? 0.073  ms/op
StringCompareToDifferentLength.compareToLU        2     128  avgt   30   3.914 ? 0.044  ms/op
StringCompareToDifferentLength.compareToLU        2     256  avgt   30   6.773 ? 0.049  ms/op
StringCompareToDifferentLength.compareToLU        2     512  avgt   30  12.504 ? 0.081  ms/op
StringCompareToDifferentLength.compareToUL        2      24  avgt   30   1.505 ? 0.107  ms/op
StringCompareToDifferentLength.compareToUL        2      36  avgt   30   1.976 ? 0.145  ms/op
StringCompareToDifferentLength.compareToUL        2      72  avgt   30   2.593 ? 0.082  ms/op
StringCompareToDifferentLength.compareToUL        2     128  avgt   30   3.998 ? 0.062  ms/op
StringCompareToDifferentLength.compareToUL        2     256  avgt   30   6.949 ? 0.110  ms/op
StringCompareToDifferentLength.compareToUL        2     512  avgt   30  12.617 ? 0.068  ms/op
StringCompareToDifferentLength.compareToUU        2      24  avgt   30   1.232 ? 0.038  ms/op
StringCompareToDifferentLength.compareToUU        2      36  avgt   30   1.505 ? 0.008  ms/op
StringCompareToDifferentLength.compareToUU        2      72  avgt   30   2.218 ? 0.066  ms/op
StringCompareToDifferentLength.compareToUU        2     128  avgt   30   3.329 ? 0.119  ms/op
StringCompareToDifferentLength.compareToUU        2     256  avgt   30   5.684 ? 0.030  ms/op
StringCompareToDifferentLength.compareToUU        2     512  avgt   30  10.520 ? 0.031  ms/op

opt:
Benchmark                                   (delta)  (size)  Mode  Cnt   Score   Error  Units
StringCompareToDifferentLength.compareToLL        2      24  avgt   30   0.824 ? 0.003  ms/op
StringCompareToDifferentLength.compareToLL        2      36  avgt   30   1.124 ? 0.032  ms/op
StringCompareToDifferentLength.compareToLL        2      72  avgt   30   1.376 ? 0.123  ms/op
StringCompareToDifferentLength.compareToLL        2     128  avgt   30   1.921 ? 0.040  ms/op
StringCompareToDifferentLength.compareToLL        2     256  avgt   30   2.656 ? 0.156  ms/op
StringCompareToDifferentLength.compareToLL        2     512  avgt   30   4.311 ? 0.267  ms/op
StringCompareToDifferentLength.compareToLU        2      24  avgt   30   1.391 ? 0.154  ms/op
StringCompareToDifferentLength.compareToLU        2      36  avgt   30   1.891 ? 0.170  ms/op
StringCompareToDifferentLength.compareToLU        2      72  avgt   30   2.496 ? 0.082  ms/op
StringCompareToDifferentLength.compareToLU        2     128  avgt   30   3.978 ? 0.046  ms/op
StringCompareToDifferentLength.compareToLU        2     256  avgt   30   6.811 ? 0.057  ms/op
StringCompareToDifferentLength.compareToLU        2     512  avgt   30  12.586 ? 0.054  ms/op
StringCompareToDifferentLength.compareToUL        2      24  avgt   30   1.462 ? 0.085  ms/op
StringCompareToDifferentLength.compareToUL        2      36  avgt   30   1.864 ? 0.070  ms/op
StringCompareToDifferentLength.compareToUL        2      72  avgt   30   2.651 ? 0.090  ms/op
StringCompareToDifferentLength.compareToUL        2     128  avgt   30   4.223 ? 0.383  ms/op
StringCompareToDifferentLength.compareToUL        2     256  avgt   30   6.858 ? 0.085  ms/op
StringCompareToDifferentLength.compareToUL        2     512  avgt   30  12.675 ? 0.099  ms/op
StringCompareToDifferentLength.compareToUU        2      24  avgt   30   1.200 ? 0.013  ms/op
StringCompareToDifferentLength.compareToUU        2      36  avgt   30   1.336 ? 0.156  ms/op
StringCompareToDifferentLength.compareToUU        2      72  avgt   30   2.364 ? 0.545  ms/op
StringCompareToDifferentLength.compareToUU        2     128  avgt   30   2.753 ? 0.154  ms/op
StringCompareToDifferentLength.compareToUU        2     256  avgt   30   5.179 ? 0.834  ms/op
StringCompareToDifferentLength.compareToUU        2     512  avgt   30   7.090 ? 0.423  ms/op

> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4871:
> 
>> 4869:     // exit from large loop when less than 64 bytes left to read or we're about
>> 4870:     // to prefetch memory behind array border
>> 4871:     int largeLoopExitCondition = MAX(64, SoftwarePrefetchHintDistance)/(isLL ? 1 : 2);
> 
> This breaks the Windows AArch64 build:
> 
> 
> Creating support/modules_libs/java.base/server/jvm.dll from 1051 file(s)
> d:\a\jdk\jdk\jdk\src\hotspot\cpu\aarch64\stubGenerator_aarch64.cpp(4871): error C3861: 'MAX': identifier not found
> make[3]: *** [lib/CompileJvm.gmk:143: /cygdrive/d/a/jdk/jdk/jdk/build/windows-aarch64/hotspot/variant-server/libjvm
> 
> 
> https://github.com/Wanghuang-Huawei/jdk/runs/3260986937
> 
> Should probably be left as `MAX2`.

Thanks, I'll fix it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4722


More information about the hotspot-compiler-dev mailing list