RFR: 8310268: RISC-V: misaligned memory access in String.Compare intrinsic [v6]
Fei Yang
fyang at openjdk.org
Tue Jul 25 06:42:42 UTC 2023
On Mon, 24 Jul 2023 14:16:02 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:
>> Please review this fix. it eliminates misaligned loads in String.compare on risc-v
>>
>> for small compares ( <= 72 bytes), the instrinsic in c2_MacroAssembler.cpp is used,
>> it reads ( in case of UU/LL) 8 bytes per loop, and at then end, it reads tail - misaligned load of last 8 bytes from the string.
>>
>> so if string length is not 8x bytes long then last load is misaligned, also it performs read/compare of some data which already was processed.
>>
>> I have changed that to compare only last length%8 bytes using SHORT_STRING part of intrinsic for UL/LU. But for UU/LL I have made an optimised version.
>>
>> Thanks to optimisations for conditional branching at line [947](https://github.com/openjdk/jdk/pull/14534/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6R947) I’ve got no perf drop on thead ( with +AvoidUnalignedAccesses) which supports misaligned access.
>>
>> Improvements to inflate_XX() methods gives 3-5% improvements for UL/LU cases on thead, almost no perf change on hifive.
>>
>> for large strings, the instrinsics from stubGenerator.cpp is used
>> for UU/LL - generate_compare_long_string_same_encoding, I have just replaced misaligned ld with load_long_misaligned. Since this tail reading is not on hot path, this give some small penalty for thead when -XX:+AvoidUnalignedAccesses.
>>
>> large LU/UL comparision is done in compare_long_string_different_encoding in sutbGenerator.cpp:
>> These changes are partially based on feilongjiang's patch, but I have changed tail reading to prevent reading past the end of string. I have observed no perf difference between feilongjiang's and my version.
>>
>> This also enables regression test for string.Compare which previously was aarch64-only
>>
>> Testing: tier1 and tier2 clean on hifive.
>>
>> JMH testing, hifive:
>> before:
>>
>> Benchmark (delta) (size) Mode Cnt Score Error Units
>> StringCompareToDifferentLength.compareToLL 2 24 avgt 9 6.474 ± 1.475 ms/op
>> StringCompareToDifferentLength.compareToLL 2 36 avgt 9 125.823 ± 1.947 ms/op
>> StringCompareToDifferentLength.compareToLL 2 72 avgt 9 10.512 ± 0.236 ms/op
>> StringCompareToDifferentLength.compareToLL 2 128 avgt 9 13.032 ± 0.821 ms/op
>> StringCompareToDifferentLength.compareToLL 2 256 avgt 9 18.983 ± 0.318 ms/op
>> StringCompareToDifferentLength.compareToLL 2 512 avgt 9 29.925 ± ...
>
> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision:
>
> remove unused branch
Hi, would you mind one more tweak? Thanks.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2364:
> 2362: tmpU = isLU ? tmp5 : tmp1, // where to keep U for comparison
> 2363: tmpL = isLU ? tmp1 : tmp5; // where to keep L for comparison
> 2364:
I see `cnt1`(x12) is unused now. We could make use of this to further eliminate use of both `tmp4` (x7) and `tmp5` (x31).
Let:
tmpU = isLU ? tmp2 : tmp1, // where to keep U for comparison
tmpL = isLU ? tmp1 : tmp2; // where to keep L for comparison
And use `cnt1` and `tmp2` instead where `tmp4` and `tmp5` are used respectively. This would help save one `mv` and saving & restoring of `tmp4` and `tmp5`. Also remember to change `tmpLval` from `tmp4` (x7) to `cnt1`(x12) in compare_string_8_x_LU.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2391:
> 2389:
> 2390: __ addi(t0, cnt2, wordSize);
> 2391: __ addi(cnt2, cnt2, wordSize*2); // amount of characters left to process
Indentation: s/__ addi(cnt2, cnt2, wordSize*2);/__ addi(cnt2, cnt2, wordSize * 2);/
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2410:
> 2408: __ bnez(tmp3, CALCULATE_DIFFERENCE);
> 2409:
> 2410: __ addi(strL, strL, wordSize/2); // Address of last 4 bytes in Latin1 string
Indentation: s/__ addi(strL, strL, wordSize/2);/__ addi(strL, strL, wordSize / 2);/
-------------
Changes requested by fyang (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/14534#pullrequestreview-1544726633
PR Review Comment: https://git.openjdk.org/jdk/pull/14534#discussion_r1273057036
PR Review Comment: https://git.openjdk.org/jdk/pull/14534#discussion_r1273058345
PR Review Comment: https://git.openjdk.org/jdk/pull/14534#discussion_r1273059841
More information about the hotspot-dev
mailing list