RFR: 8310268: RISC-V: misaligned memory access in String.Compare intrinsic
Vladimir Kempik
vkempik at openjdk.org
Mon Jun 26 12:45:02 UTC 2023
On Mon, 26 Jun 2023 08:34:39 GMT, Fei Yang <fyang at openjdk.org> wrote:
> > The compare_short ( in c2_macroAssember) was doing too much conditional branches in one place,it was possible to slightly reduce it. Thanks for looking at it.
>
> I haven't checked other changes, but for the possible unaligned accesses in file src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#LL958-L971
>
> It looks that simple calls to load_X_misaligned assemblers with proper granularity will suffice, which would be much simpler.
I have made a simlpified version of c2_MacroAssembler_riscv.cpp patch - https://github.com/VladimirKempik/jdk/commit/656af81f1aa3f026cf3e1868b3813c7488b2775f
which is basically - if the tail is 8 bytes - use ld/lwu, otherwise use load_X_misaligned.
The results on hifive ( I have made a version of jmh test with small string lengths)
hifive, current pr:
Benchmark (delta) (size) Mode Cnt Score Error Units
StringCompareToDifferentLength.compareToLL 2 7 avgt 9 7.826 ± 0.803 ms/op
StringCompareToDifferentLength.compareToLL 2 8 avgt 9 8.510 ± 0.884 ms/op
StringCompareToDifferentLength.compareToLL 2 15 avgt 9 7.171 ± 0.957 ms/op
StringCompareToDifferentLength.compareToLL 2 24 avgt 9 6.469 ± 0.701 ms/op
StringCompareToDifferentLength.compareToLL 2 36 avgt 9 7.970 ± 0.578 ms/op
StringCompareToDifferentLength.compareToLU 2 7 avgt 9 8.700 ± 0.583 ms/op
StringCompareToDifferentLength.compareToLU 2 8 avgt 9 8.079 ± 0.910 ms/op
StringCompareToDifferentLength.compareToLU 2 15 avgt 9 11.577 ± 0.650 ms/op
StringCompareToDifferentLength.compareToLU 2 24 avgt 9 13.612 ± 0.436 ms/op
StringCompareToDifferentLength.compareToLU 2 36 avgt 9 17.866 ± 0.922 ms/op
StringCompareToDifferentLength.compareToUL 2 7 avgt 9 8.755 ± 0.561 ms/op
StringCompareToDifferentLength.compareToUL 2 8 avgt 9 10.201 ± 0.633 ms/op
StringCompareToDifferentLength.compareToUL 2 15 avgt 9 11.568 ± 0.459 ms/op
StringCompareToDifferentLength.compareToUL 2 24 avgt 9 15.762 ± 0.630 ms/op
StringCompareToDifferentLength.compareToUL 2 36 avgt 9 19.614 ± 0.677 ms/op
StringCompareToDifferentLength.compareToUU 2 7 avgt 9 7.463 ± 0.306 ms/op
StringCompareToDifferentLength.compareToUU 2 8 avgt 9 6.102 ± 0.978 ms/op
StringCompareToDifferentLength.compareToUU 2 15 avgt 9 8.144 ± 1.073 ms/op
StringCompareToDifferentLength.compareToUU 2 24 avgt 9 9.413 ± 0.959 ms/op
StringCompareToDifferentLength.compareToUU 2 36 avgt 9 11.012 ± 0.345 ms/op
hifive, from compare_lam2 branch:
Benchmark (delta) (size) Mode Cnt Score Error Units
StringCompareToDifferentLength.compareToLL 2 7 avgt 9 7.899 ± 0.761 ms/op
StringCompareToDifferentLength.compareToLL 2 8 avgt 9 8.635 ± 0.626 ms/op
StringCompareToDifferentLength.compareToLL 2 15 avgt 9 8.663 ± 0.647 ms/op
StringCompareToDifferentLength.compareToLL 2 24 avgt 9 7.015 ± 0.889 ms/op
StringCompareToDifferentLength.compareToLL 2 36 avgt 9 10.199 ± 0.671 ms/op
StringCompareToDifferentLength.compareToLU 2 7 avgt 9 9.685 ± 0.991 ms/op
StringCompareToDifferentLength.compareToLU 2 8 avgt 9 8.402 ± 0.650 ms/op
StringCompareToDifferentLength.compareToLU 2 15 avgt 9 12.259 ± 0.753 ms/op
StringCompareToDifferentLength.compareToLU 2 24 avgt 9 13.637 ± 0.828 ms/op
StringCompareToDifferentLength.compareToLU 2 36 avgt 9 18.201 ± 0.994 ms/op
StringCompareToDifferentLength.compareToUL 2 7 avgt 9 11.866 ± 0.791 ms/op
StringCompareToDifferentLength.compareToUL 2 8 avgt 9 10.466 ± 0.568 ms/op
StringCompareToDifferentLength.compareToUL 2 15 avgt 9 14.092 ± 0.331 ms/op
StringCompareToDifferentLength.compareToUL 2 24 avgt 9 15.518 ± 0.314 ms/op
StringCompareToDifferentLength.compareToUL 2 36 avgt 9 19.325 ± 0.452 ms/op
StringCompareToDifferentLength.compareToUU 2 7 avgt 9 7.422 ± 0.565 ms/op
StringCompareToDifferentLength.compareToUU 2 8 avgt 9 6.409 ± 0.748 ms/op
StringCompareToDifferentLength.compareToUU 2 15 avgt 9 8.357 ± 0.983 ms/op
StringCompareToDifferentLength.compareToUU 2 24 avgt 9 9.453 ± 0.911 ms/op
StringCompareToDifferentLength.compareToUU 2 36 avgt 9 11.561 ± 0.745 ms/op
Clear performance degradation when we have to go into TAIL ( cases 15, 36 for LL. 7 and 15 for LU/UL)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14534#issuecomment-1607389045
More information about the hotspot-dev
mailing list