RFR: 8349556: RISC-V: improve the performance when -COH and -AvoidUnalignedAccesses for UL and LU string comparison

Hamlin Li mli at openjdk.org
Fri Feb 7 10:40:02 UTC 2025


On Fri, 7 Feb 2025 09:31:26 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Hi,
>> 
>> Can you help to review the patch?
>> 
>> It tries to improve the string compare when AvoidUnalignedAccesses == false && encoding is LU or UL (i.e. 2 strings encodings are different with each other).
>> The jmh test shows when `-CompactObjectHeaders` (i.e. -COH) && `-AvoidUnalignedAccesses`, the patch bring much better performance, and in other cases, it does not bring obvious regression. And currently by default it's -COH.
>> 
>> Thanks
>> 
>> ### Performance
>> 
>> -COH-AvoidUnalignedAccesses
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> “-COH-Avoid” | (delta) | (size) | (utf16) | Mode | Cnt | Score - master | Score - patch | Error | Units | Improvement
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 24 | N/A | avgt | 10 | 6438443.073 | 6383881.891 | 36912.539 | ns/op | 0.009
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 36 | N/A | avgt | 10 | 9421176.34 | 9390907.1 | 21034.266 | ns/op | 0.003
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 72 | N/A | avgt | 10 | 18592342.33 | 16871350.38 | 15550.827 | ns/op | 0.102
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 128 | N/A | avgt | 10 | 30916157.05 | 28646961.11 | 9263.556 | ns/op | 0.079
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 256 | N/A | avgt | 10 | 58945069.17 | 55505097.77 | 8803.847 | ns/op | 0.062
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 512 | N/A | avgt | 10 | 115520355.5 | 110233842.8 | 35056.972 | ns/op | 0.048
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToUL | 2 | 24 | N/A | avgt | 10 | 7541299.83 | 7481385.995 | 43240.713 | ns/op | 0.008
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToUL | 2 | 36 | N/A | avgt | 10 | 10295051.77 | 10264978.04 | 38938.956 | ns/op | 0.003
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToUL | 2 | 72 | N/A | avgt | 10 | 19652419.64 | 17953481.41 | 10987.17 | ns/op | 0.095
>> com.arm.benchmarks.intrinsics.StringCompareToD...
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2528:
> 
>> 2526:              tmpL = isLU ? tmp1 : tmp2; // where to keep L for comparison
>> 2527: 
>> 2528:     if (AvoidUnalignedAccesses && (base_offset1 % 8) != 0) {
> 
> I find that a similar check is in `C2_MacroAssembler::string_compare` for the UU/LL cases [1].
> Seems more consistent if we move it into the counterpart `generate_compare_long_string_same_encoding`.
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1443

Agree, we should refactor the code a bit to make it more readable.
As it seems just a refactor, so I can do it in another pr, how do you think about it?
At the same time I can also clean the invocation of `compare_string_16_bytes_same` from `generate_compare_long_string_same_encoding`, I don't like the implicit registers passing between them.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23495#discussion_r1946313359


More information about the hotspot-compiler-dev mailing list