RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v3]
Nick Gasson
ngasson at openjdk.java.net
Wed Jul 14 08:51:14 UTC 2021
On Tue, 13 Jul 2021 07:37:31 GMT, Wang Huang <whuang at openjdk.org> wrote:
>> Dear all,
>> Can you do me a favor to review this patch. This patch use `ldp` to implement String.compareTo.
>>
>> * We add a JMH test case
>> * Here is the result of this test case
>>
>> Benchmark |(size)| Mode| Cnt|Score | Error |Units
>> ---------------------------------|------|-----|----|------|--------|-----
>> StringCompare.compareLL | 64 | avgt| 5 |7.992 | ± 0.005|us/op
>> StringCompare.compareLL | 72 | avgt| 5 |15.029| ± 0.006|us/op
>> StringCompare.compareLL | 80 | avgt| 5 |14.655| ± 0.011|us/op
>> StringCompare.compareLL | 91 | avgt| 5 |16.363| ± 0.12 |us/op
>> StringCompare.compareLL | 101 | avgt| 5 |16.966| ± 0.007|us/op
>> StringCompare.compareLL | 121 | avgt| 5 |19.276| ± 0.006|us/op
>> StringCompare.compareLL | 181 | avgt| 5 |19.002| ± 0.417|us/op
>> StringCompare.compareLL | 256 | avgt| 5 |24.707| ± 0.041|us/op
>> StringCompare.compareLLWithLdp| 64 | avgt| 5 |8.001 | ± 0.121|us/op
>> StringCompare.compareLLWithLdp| 72 | avgt| 5 |11.573| ± 0.003|us/op
>> StringCompare.compareLLWithLdp| 80 | avgt| 5 |6.861 | ± 0.004|us/op
>> StringCompare.compareLLWithLdp| 91 | avgt| 5 |12.774| ± 0.201|us/op
>> StringCompare.compareLLWithLdp| 101 | avgt| 5 |8.691 | ± 0.004|us/op
>> StringCompare.compareLLWithLdp| 121 | avgt| 5 |11.091| ± 1.342|us/op
>> StringCompare.compareLLWithLdp| 181 | avgt| 5 |14.64 | ± 0.581|us/op
>> StringCompare.compareLLWithLdp| 256 | avgt| 5 |25.879| ± 1.775|us/op
>> StringCompare.compareUU | 64 | avgt| 5 |13.476| ± 0.01 |us/op
>> StringCompare.compareUU | 72 | avgt| 5 |15.078| ± 0.006|us/op
>> StringCompare.compareUU | 80 | avgt| 5 |23.512| ± 0.011|us/op
>> StringCompare.compareUU | 91 | avgt| 5 |24.284| ± 0.008|us/op
>> StringCompare.compareUU | 101 | avgt| 5 |20.707| ± 0.017|us/op
>> StringCompare.compareUU | 121 | avgt| 5 |29.302| ± 0.011|us/op
>> StringCompare.compareUU | 181 | avgt| 5 |39.31 | ± 0.016|us/op
>> StringCompare.compareUU | 256 | avgt| 5 |54.592| ± 0.392|us/op
>> StringCompare.compareUUWithLdp| 64 | avgt| 5 |16.389| ± 0.008|us/op
>> StringCompare.compareUUWithLdp| 72 | avgt| 5 |10.71 | ± 0.158|us/op
>> StringCompare.compareUUWithLdp| 80 | avgt| 5 |11.488| ± 0.024|us/op
>> StringCompare.compareUUWithLdp| 91 | avgt| 5 |13.412| ± 0.006|us/op
>> StringCompare.compareUUWithLdp| 101 | avgt| 5 |16.245| ± 0.434|us/op
>> StringCompare.compareUUWithLdp| 121 | avgt| 5 |16.597| ± 0.016|us/op
>> StringCompare.compareUUWithLdp| 181 | avgt| 5 |27.373| ± 0.017|us/op
>> StringCompare.compareUUWithLdp| 256 | avgt| 5 |41.74 | ± 3.5 |us/op
>>
>> From this table, we can see that in most cases, our patch is better than old one.
>>
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional commit since the last revision:
>
> refact codes
Have you tested when the data in the `byte[]` array is not 16B aligned? With the default JVM options the array data naturally starts 16B into the object, but you can force a different alignment with e.g. `-XX:-UseCompressedClassPointers`. I tried that on N1 and it's very slightly slower than with the 16B alignment, but still faster than the non-LDP version for length 1024 strings. On A72 the difference is a bit bigger but again faster than non-LDP.
N1, -UseCompressedClassPointers
Benchmark (diff_pos) (size) Mode Cnt Score Error Units
StringCompare.compareLLDiffStrings 1023 1024 avgt 5 67.789 ? 0.095 us/op
StringCompare.compareLLDiffStringsWithLdp 1023 1024 avgt 5 45.912 ? 0.059 us/op
StringCompare.compareUUDiffStrings 1023 1024 avgt 5 133.365 ? 0.086 us/op
StringCompare.compareUUDiffStringsWithLdp 1023 1024 avgt 5 89.009 ? 0.312 us/op
N1, +UseCompressedClassPointers
Benchmark (diff_pos) (size) Mode Cnt Score Error Units
StringCompare.compareLLDiffStrings 1023 1024 avgt 5 67.878 ? 0.156 us/op
StringCompare.compareLLDiffStringsWithLdp 1023 1024 avgt 5 46.487 ? 0.115 us/op
StringCompare.compareUUDiffStrings 1023 1024 avgt 5 133.576 ? 0.111 us/op
StringCompare.compareUUDiffStringsWithLdp 1023 1024 avgt 5 90.462 ? 0.176 us/op
A72, -UseCompressedClassPointers
Benchmark (diff_pos) (size) Mode Cnt Score Error Units
StringCompare.compareLLDiffStrings 1023 1024 avgt 5 122.697 ? 0.235 us/op
StringCompare.compareLLDiffStringsWithLdp 1023 1024 avgt 5 73.883 ? 0.136 us/op
StringCompare.compareUUDiffStrings 1023 1024 avgt 5 243.022 ? 0.457 us/op
StringCompare.compareUUDiffStringsWithLdp 1023 1024 avgt 5 150.659 ? 0.313 us/op
A72, +UseCompressedClassPointers
Benchmark (diff_pos) (size) Mode Cnt Score Error Units
StringCompare.compareLLDiffStrings 1023 1024 avgt 5 122.754 ? 0.563 us/op
StringCompare.compareLLDiffStringsWithLdp 1023 1024 avgt 5 68.513 ? 0.282 us/op
StringCompare.compareUUDiffStrings 1023 1024 avgt 5 243.043 ? 0.882 us/op
StringCompare.compareUUDiffStringsWithLdp 1023 1024 avgt 5 139.152 ? 0.333 us/op
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4966:
> 4964:
> 4965: __ bind(LESS8); // directly load last 8 bytes
> 4966: if(!isLL) {
The indentation of the `if` seems wrong here: shouldn't it line up with the assembly block like the `if` on line 4989? Also needs a space before the parenthesis.
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4990:
> 4988: __ lsrv(tmp2, tmp2, rscratch2);
> 4989: if (isLL) {
> 4990: __ uxtbw(tmp1, tmp1);
Convention is to indent with two spaces but have four here.
-------------
PR: https://git.openjdk.java.net/jdk/pull/4722
More information about the hotspot-compiler-dev
mailing list