RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v2]
Andrew Haley
aph at openjdk.java.net
Mon Jul 12 15:41:55 UTC 2021
On Mon, 12 Jul 2021 09:14:25 GMT, Wang Huang <whuang at openjdk.org> wrote:
>> Dear all,
>> Can you do me a favor to review this patch. This patch use `ldp` to implement String.compareTo.
>>
>> * We add a JMH test case
>> * Here is the result of this test case
>>
>> Benchmark |(size)| Mode| Cnt|Score | Error |Units
>> ---------------------------------|------|-----|----|------|--------|-----
>> StringCompare.compareLL | 64 | avgt| 5 |7.992 | ± 0.005|us/op
>> StringCompare.compareLL | 72 | avgt| 5 |15.029| ± 0.006|us/op
>> StringCompare.compareLL | 80 | avgt| 5 |14.655| ± 0.011|us/op
>> StringCompare.compareLL | 91 | avgt| 5 |16.363| ± 0.12 |us/op
>> StringCompare.compareLL | 101 | avgt| 5 |16.966| ± 0.007|us/op
>> StringCompare.compareLL | 121 | avgt| 5 |19.276| ± 0.006|us/op
>> StringCompare.compareLL | 181 | avgt| 5 |19.002| ± 0.417|us/op
>> StringCompare.compareLL | 256 | avgt| 5 |24.707| ± 0.041|us/op
>> StringCompare.compareLLWithLdp| 64 | avgt| 5 |8.001 | ± 0.121|us/op
>> StringCompare.compareLLWithLdp| 72 | avgt| 5 |11.573| ± 0.003|us/op
>> StringCompare.compareLLWithLdp| 80 | avgt| 5 |6.861 | ± 0.004|us/op
>> StringCompare.compareLLWithLdp| 91 | avgt| 5 |12.774| ± 0.201|us/op
>> StringCompare.compareLLWithLdp| 101 | avgt| 5 |8.691 | ± 0.004|us/op
>> StringCompare.compareLLWithLdp| 121 | avgt| 5 |11.091| ± 1.342|us/op
>> StringCompare.compareLLWithLdp| 181 | avgt| 5 |14.64 | ± 0.581|us/op
>> StringCompare.compareLLWithLdp| 256 | avgt| 5 |25.879| ± 1.775|us/op
>> StringCompare.compareUU | 64 | avgt| 5 |13.476| ± 0.01 |us/op
>> StringCompare.compareUU | 72 | avgt| 5 |15.078| ± 0.006|us/op
>> StringCompare.compareUU | 80 | avgt| 5 |23.512| ± 0.011|us/op
>> StringCompare.compareUU | 91 | avgt| 5 |24.284| ± 0.008|us/op
>> StringCompare.compareUU | 101 | avgt| 5 |20.707| ± 0.017|us/op
>> StringCompare.compareUU | 121 | avgt| 5 |29.302| ± 0.011|us/op
>> StringCompare.compareUU | 181 | avgt| 5 |39.31 | ± 0.016|us/op
>> StringCompare.compareUU | 256 | avgt| 5 |54.592| ± 0.392|us/op
>> StringCompare.compareUUWithLdp| 64 | avgt| 5 |16.389| ± 0.008|us/op
>> StringCompare.compareUUWithLdp| 72 | avgt| 5 |10.71 | ± 0.158|us/op
>> StringCompare.compareUUWithLdp| 80 | avgt| 5 |11.488| ± 0.024|us/op
>> StringCompare.compareUUWithLdp| 91 | avgt| 5 |13.412| ± 0.006|us/op
>> StringCompare.compareUUWithLdp| 101 | avgt| 5 |16.245| ± 0.434|us/op
>> StringCompare.compareUUWithLdp| 121 | avgt| 5 |16.597| ± 0.016|us/op
>> StringCompare.compareUUWithLdp| 181 | avgt| 5 |27.373| ± 0.017|us/op
>> StringCompare.compareUUWithLdp| 256 | avgt| 5 |41.74 | ± 3.5 |us/op
>>
>> From this table, we can see that in most cases, our patch is better than old one.
>>
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional commit since the last revision:
>
> draft of refactor
And with longer strings, M1 and ThunderX2:
Benchmark (diff_pos) (size) Mode Cnt Score Error Units
StringCompare.compareLLDiffStrings 1023 1024 avgt 3 50.849 ± 0.087 us/op
StringCompare.compareLLDiffStringsWithLdp 1023 1024 avgt 3 23.676 ± 0.015 us/op
StringCompare.compareLLDiffStringsWithRefactor 1023 1024 avgt 3 28.967 ± 0.168 us/op
StringCompare.compareLLDiffStrings 1023 1024 avgt 3 98.681 ± 0.026 us/op
StringCompare.compareLLDiffStringsWithLdp 1023 1024 avgt 3 82.576 ± 0.656 us/op
StringCompare.compareLLDiffStringsWithRefactor 1023 1024 avgt 3 98.801 ± 0.321 us/op
LDP wins on M1 here, but on ThunderX2 it makes almost no difference at all. And how often are we comparing such long strings?
I don't know what to think, really. It seems that we're near to a place where we're optimizing for micro-architecture, and I don't want to see that here. On the other hand, using LDP is not worse anywhere, so we should allow it.
-------------
PR: https://git.openjdk.java.net/jdk/pull/4722
More information about the hotspot-compiler-dev
mailing list