RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]
Wu Yan
wuyan at openjdk.java.net
Thu Aug 26 09:29:31 UTC 2021
On Wed, 25 Aug 2021 07:40:56 GMT, Nick Gasson <ngasson at openjdk.org> wrote:
> I've run the benchmark on several different machines and didn't see any performance regressions, and the speed-up for longer strings looks quite good. I also ran jtreg tier1-3 with no new failures so I think this is ok.
>
> If you fix the Windows build I'll approve it. But please wait for another review, preferably from @theRealAph.
OK, Thank you very much!
> Note that JDK-8269559 (#5129) is also adding a JMH benchmark for this intrinsic: it would be good if we could merge them, either now or later.
The JMH benchmark added by JDK-8269559 (#5129) can cover our test items (compareToLL and compareToUU), and can show the improvement of our patch, so we decided to delete our JMH benchmark in the next commit.
The test results using that JMH benchmark in JDK-8269559 are as follows:
Raspberry Pi 4B
base:
Benchmark (delta) (size) Mode Cnt Score Error Units
StringCompareToDifferentLength.compareToLL 2 24 avgt 3 2.310 ? 0.050 ms/op
StringCompareToDifferentLength.compareToLL 2 36 avgt 3 2.818 ? 0.185 ms/op
StringCompareToDifferentLength.compareToLL 2 72 avgt 3 3.151 ? 0.215 ms/op
StringCompareToDifferentLength.compareToLL 2 128 avgt 3 4.171 ? 1.320 ms/op
StringCompareToDifferentLength.compareToLL 2 256 avgt 3 6.169 ? 0.653 ms/op
StringCompareToDifferentLength.compareToLL 2 512 avgt 3 10.911 ? 0.175 ms/op
StringCompareToDifferentLength.compareToLU 2 24 avgt 3 3.312 ? 0.102 ms/op
StringCompareToDifferentLength.compareToLU 2 36 avgt 3 4.162 ? 0.032 ms/op
StringCompareToDifferentLength.compareToLU 2 72 avgt 3 5.705 ? 0.152 ms/op
StringCompareToDifferentLength.compareToLU 2 128 avgt 3 9.301 ? 0.749 ms/op
StringCompareToDifferentLength.compareToLU 2 256 avgt 3 16.507 ? 1.353 ms/op
StringCompareToDifferentLength.compareToLU 2 512 avgt 3 30.160 ? 0.377 ms/op
StringCompareToDifferentLength.compareToUL 2 24 avgt 3 3.366 ? 0.280 ms/op
StringCompareToDifferentLength.compareToUL 2 36 avgt 3 4.308 ? 0.037 ms/op
StringCompareToDifferentLength.compareToUL 2 72 avgt 3 5.674 ? 0.210 ms/op
StringCompareToDifferentLength.compareToUL 2 128 avgt 3 9.358 ? 0.158 ms/op
StringCompareToDifferentLength.compareToUL 2 256 avgt 3 16.165 ? 0.158 ms/op
StringCompareToDifferentLength.compareToUL 2 512 avgt 3 29.857 ? 0.277 ms/op
StringCompareToDifferentLength.compareToUU 2 24 avgt 3 3.149 ? 0.209 ms/op
StringCompareToDifferentLength.compareToUU 2 36 avgt 3 3.157 ? 0.102 ms/op
StringCompareToDifferentLength.compareToUU 2 72 avgt 3 4.415 ? 0.073 ms/op
StringCompareToDifferentLength.compareToUU 2 128 avgt 3 6.244 ? 0.224 ms/op
StringCompareToDifferentLength.compareToUU 2 256 avgt 3 11.032 ? 0.080 ms/op
StringCompareToDifferentLength.compareToUU 2 512 avgt 3 20.942 ? 3.973 ms/op
opt:
Benchmark (delta) (size) Mode Cnt Score Error Units
StringCompareToDifferentLength.compareToLL 2 24 avgt 3 2.319 ? 0.121 ms/op
StringCompareToDifferentLength.compareToLL 2 36 avgt 3 2.820 ? 0.096 ms/op
StringCompareToDifferentLength.compareToLL 2 72 avgt 3 2.511 ? 0.024 ms/op
StringCompareToDifferentLength.compareToLL 2 128 avgt 3 3.496 ? 0.382 ms/op
StringCompareToDifferentLength.compareToLL 2 256 avgt 3 5.215 ? 0.210 ms/op
StringCompareToDifferentLength.compareToLL 2 512 avgt 3 7.772 ? 0.448 ms/op
StringCompareToDifferentLength.compareToLU 2 24 avgt 3 3.432 ? 0.249 ms/op
StringCompareToDifferentLength.compareToLU 2 36 avgt 3 4.156 ? 0.052 ms/op
StringCompareToDifferentLength.compareToLU 2 72 avgt 3 5.735 ? 0.043 ms/op
StringCompareToDifferentLength.compareToLU 2 128 avgt 3 9.215 ? 0.394 ms/op
StringCompareToDifferentLength.compareToLU 2 256 avgt 3 16.373 ? 0.515 ms/op
StringCompareToDifferentLength.compareToLU 2 512 avgt 3 29.906 ? 0.245 ms/op
StringCompareToDifferentLength.compareToUL 2 24 avgt 3 3.361 ? 0.116 ms/op
StringCompareToDifferentLength.compareToUL 2 36 avgt 3 4.253 ? 0.061 ms/op
StringCompareToDifferentLength.compareToUL 2 72 avgt 3 5.744 ? 0.082 ms/op
StringCompareToDifferentLength.compareToUL 2 128 avgt 3 9.167 ? 0.343 ms/op
StringCompareToDifferentLength.compareToUL 2 256 avgt 3 16.591 ? 0.999 ms/op
StringCompareToDifferentLength.compareToUL 2 512 avgt 3 30.232 ? 2.057 ms/op
StringCompareToDifferentLength.compareToUU 2 24 avgt 3 3.147 ? 0.057 ms/op
StringCompareToDifferentLength.compareToUU 2 36 avgt 3 2.526 ? 0.027 ms/op
StringCompareToDifferentLength.compareToUU 2 72 avgt 3 3.832 ? 0.228 ms/op
StringCompareToDifferentLength.compareToUU 2 128 avgt 3 5.332 ? 0.173 ms/op
StringCompareToDifferentLength.compareToUU 2 256 avgt 3 8.417 ? 0.551 ms/op
StringCompareToDifferentLength.compareToUU 2 512 avgt 3 14.903 ? 0.782 ms/op
Hisilicon
base:
Benchmark (delta) (size) Mode Cnt Score Error Units
StringCompareToDifferentLength.compareToLL 2 24 avgt 30 0.824 ? 0.003 ms/op
StringCompareToDifferentLength.compareToLL 2 36 avgt 30 1.123 ? 0.050 ms/op
StringCompareToDifferentLength.compareToLL 2 72 avgt 30 1.550 ? 0.052 ms/op
StringCompareToDifferentLength.compareToLL 2 128 avgt 30 2.015 ? 0.040 ms/op
StringCompareToDifferentLength.compareToLL 2 256 avgt 30 3.154 ? 0.032 ms/op
StringCompareToDifferentLength.compareToLL 2 512 avgt 30 5.519 ? 0.044 ms/op
StringCompareToDifferentLength.compareToLU 2 24 avgt 30 1.469 ? 0.196 ms/op
StringCompareToDifferentLength.compareToLU 2 36 avgt 30 1.777 ? 0.097 ms/op
StringCompareToDifferentLength.compareToLU 2 72 avgt 30 2.509 ? 0.073 ms/op
StringCompareToDifferentLength.compareToLU 2 128 avgt 30 3.914 ? 0.044 ms/op
StringCompareToDifferentLength.compareToLU 2 256 avgt 30 6.773 ? 0.049 ms/op
StringCompareToDifferentLength.compareToLU 2 512 avgt 30 12.504 ? 0.081 ms/op
StringCompareToDifferentLength.compareToUL 2 24 avgt 30 1.505 ? 0.107 ms/op
StringCompareToDifferentLength.compareToUL 2 36 avgt 30 1.976 ? 0.145 ms/op
StringCompareToDifferentLength.compareToUL 2 72 avgt 30 2.593 ? 0.082 ms/op
StringCompareToDifferentLength.compareToUL 2 128 avgt 30 3.998 ? 0.062 ms/op
StringCompareToDifferentLength.compareToUL 2 256 avgt 30 6.949 ? 0.110 ms/op
StringCompareToDifferentLength.compareToUL 2 512 avgt 30 12.617 ? 0.068 ms/op
StringCompareToDifferentLength.compareToUU 2 24 avgt 30 1.232 ? 0.038 ms/op
StringCompareToDifferentLength.compareToUU 2 36 avgt 30 1.505 ? 0.008 ms/op
StringCompareToDifferentLength.compareToUU 2 72 avgt 30 2.218 ? 0.066 ms/op
StringCompareToDifferentLength.compareToUU 2 128 avgt 30 3.329 ? 0.119 ms/op
StringCompareToDifferentLength.compareToUU 2 256 avgt 30 5.684 ? 0.030 ms/op
StringCompareToDifferentLength.compareToUU 2 512 avgt 30 10.520 ? 0.031 ms/op
opt:
Benchmark (delta) (size) Mode Cnt Score Error Units
StringCompareToDifferentLength.compareToLL 2 24 avgt 30 0.824 ? 0.003 ms/op
StringCompareToDifferentLength.compareToLL 2 36 avgt 30 1.124 ? 0.032 ms/op
StringCompareToDifferentLength.compareToLL 2 72 avgt 30 1.376 ? 0.123 ms/op
StringCompareToDifferentLength.compareToLL 2 128 avgt 30 1.921 ? 0.040 ms/op
StringCompareToDifferentLength.compareToLL 2 256 avgt 30 2.656 ? 0.156 ms/op
StringCompareToDifferentLength.compareToLL 2 512 avgt 30 4.311 ? 0.267 ms/op
StringCompareToDifferentLength.compareToLU 2 24 avgt 30 1.391 ? 0.154 ms/op
StringCompareToDifferentLength.compareToLU 2 36 avgt 30 1.891 ? 0.170 ms/op
StringCompareToDifferentLength.compareToLU 2 72 avgt 30 2.496 ? 0.082 ms/op
StringCompareToDifferentLength.compareToLU 2 128 avgt 30 3.978 ? 0.046 ms/op
StringCompareToDifferentLength.compareToLU 2 256 avgt 30 6.811 ? 0.057 ms/op
StringCompareToDifferentLength.compareToLU 2 512 avgt 30 12.586 ? 0.054 ms/op
StringCompareToDifferentLength.compareToUL 2 24 avgt 30 1.462 ? 0.085 ms/op
StringCompareToDifferentLength.compareToUL 2 36 avgt 30 1.864 ? 0.070 ms/op
StringCompareToDifferentLength.compareToUL 2 72 avgt 30 2.651 ? 0.090 ms/op
StringCompareToDifferentLength.compareToUL 2 128 avgt 30 4.223 ? 0.383 ms/op
StringCompareToDifferentLength.compareToUL 2 256 avgt 30 6.858 ? 0.085 ms/op
StringCompareToDifferentLength.compareToUL 2 512 avgt 30 12.675 ? 0.099 ms/op
StringCompareToDifferentLength.compareToUU 2 24 avgt 30 1.200 ? 0.013 ms/op
StringCompareToDifferentLength.compareToUU 2 36 avgt 30 1.336 ? 0.156 ms/op
StringCompareToDifferentLength.compareToUU 2 72 avgt 30 2.364 ? 0.545 ms/op
StringCompareToDifferentLength.compareToUU 2 128 avgt 30 2.753 ? 0.154 ms/op
StringCompareToDifferentLength.compareToUU 2 256 avgt 30 5.179 ? 0.834 ms/op
StringCompareToDifferentLength.compareToUU 2 512 avgt 30 7.090 ? 0.423 ms/op
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4871:
>
>> 4869: // exit from large loop when less than 64 bytes left to read or we're about
>> 4870: // to prefetch memory behind array border
>> 4871: int largeLoopExitCondition = MAX(64, SoftwarePrefetchHintDistance)/(isLL ? 1 : 2);
>
> This breaks the Windows AArch64 build:
>
>
> Creating support/modules_libs/java.base/server/jvm.dll from 1051 file(s)
> d:\a\jdk\jdk\jdk\src\hotspot\cpu\aarch64\stubGenerator_aarch64.cpp(4871): error C3861: 'MAX': identifier not found
> make[3]: *** [lib/CompileJvm.gmk:143: /cygdrive/d/a/jdk/jdk/jdk/build/windows-aarch64/hotspot/variant-server/libjvm
>
>
> https://github.com/Wanghuang-Huawei/jdk/runs/3260986937
>
> Should probably be left as `MAX2`.
Thanks, I'll fix it.
-------------
PR: https://git.openjdk.java.net/jdk/pull/4722
More information about the hotspot-compiler-dev
mailing list