RFR: 8310268: RISC-V: misaligned memory access in String.Compare intrinsic

Vladimir Kempik vkempik at openjdk.org
Mon Jun 26 12:45:02 UTC 2023


On Mon, 26 Jun 2023 08:34:39 GMT, Fei Yang <fyang at openjdk.org> wrote:

> > The compare_short ( in c2_macroAssember) was doing too much conditional branches in one place,it was possible to slightly reduce it. Thanks for looking at it.
> 
> I haven't checked other changes, but for the possible unaligned accesses in file src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#LL958-L971
> 
> It looks that simple calls to load_X_misaligned assemblers with proper granularity will suffice, which would be much simpler.

I have made a simlpified version of c2_MacroAssembler_riscv.cpp patch - https://github.com/VladimirKempik/jdk/commit/656af81f1aa3f026cf3e1868b3813c7488b2775f

which is basically - if the tail is 8 bytes - use ld/lwu, otherwise use load_X_misaligned.
The results on hifive ( I have made a version of jmh test with small string lengths)

hifive, current pr:

Benchmark                                   (delta)  (size)  Mode  Cnt   Score   Error  Units
StringCompareToDifferentLength.compareToLL        2       7  avgt    9   7.826 ± 0.803  ms/op
StringCompareToDifferentLength.compareToLL        2       8  avgt    9   8.510 ± 0.884  ms/op
StringCompareToDifferentLength.compareToLL        2      15  avgt    9   7.171 ± 0.957  ms/op
StringCompareToDifferentLength.compareToLL        2      24  avgt    9   6.469 ± 0.701  ms/op
StringCompareToDifferentLength.compareToLL        2      36  avgt    9   7.970 ± 0.578  ms/op
StringCompareToDifferentLength.compareToLU        2       7  avgt    9   8.700 ± 0.583  ms/op
StringCompareToDifferentLength.compareToLU        2       8  avgt    9   8.079 ± 0.910  ms/op
StringCompareToDifferentLength.compareToLU        2      15  avgt    9  11.577 ± 0.650  ms/op
StringCompareToDifferentLength.compareToLU        2      24  avgt    9  13.612 ± 0.436  ms/op
StringCompareToDifferentLength.compareToLU        2      36  avgt    9  17.866 ± 0.922  ms/op
StringCompareToDifferentLength.compareToUL        2       7  avgt    9   8.755 ± 0.561  ms/op
StringCompareToDifferentLength.compareToUL        2       8  avgt    9  10.201 ± 0.633  ms/op
StringCompareToDifferentLength.compareToUL        2      15  avgt    9  11.568 ± 0.459  ms/op
StringCompareToDifferentLength.compareToUL        2      24  avgt    9  15.762 ± 0.630  ms/op
StringCompareToDifferentLength.compareToUL        2      36  avgt    9  19.614 ± 0.677  ms/op
StringCompareToDifferentLength.compareToUU        2       7  avgt    9   7.463 ± 0.306  ms/op
StringCompareToDifferentLength.compareToUU        2       8  avgt    9   6.102 ± 0.978  ms/op
StringCompareToDifferentLength.compareToUU        2      15  avgt    9   8.144 ± 1.073  ms/op
StringCompareToDifferentLength.compareToUU        2      24  avgt    9   9.413 ± 0.959  ms/op
StringCompareToDifferentLength.compareToUU        2      36  avgt    9  11.012 ± 0.345  ms/op


hifive, from compare_lam2 branch:

Benchmark                                   (delta)  (size)  Mode  Cnt   Score   Error  Units
StringCompareToDifferentLength.compareToLL        2       7  avgt    9   7.899 ± 0.761  ms/op
StringCompareToDifferentLength.compareToLL        2       8  avgt    9   8.635 ± 0.626  ms/op
StringCompareToDifferentLength.compareToLL        2      15  avgt    9   8.663 ± 0.647  ms/op
StringCompareToDifferentLength.compareToLL        2      24  avgt    9   7.015 ± 0.889  ms/op
StringCompareToDifferentLength.compareToLL        2      36  avgt    9  10.199 ± 0.671  ms/op
StringCompareToDifferentLength.compareToLU        2       7  avgt    9   9.685 ± 0.991  ms/op
StringCompareToDifferentLength.compareToLU        2       8  avgt    9   8.402 ± 0.650  ms/op
StringCompareToDifferentLength.compareToLU        2      15  avgt    9  12.259 ± 0.753  ms/op
StringCompareToDifferentLength.compareToLU        2      24  avgt    9  13.637 ± 0.828  ms/op
StringCompareToDifferentLength.compareToLU        2      36  avgt    9  18.201 ± 0.994  ms/op
StringCompareToDifferentLength.compareToUL        2       7  avgt    9  11.866 ± 0.791  ms/op
StringCompareToDifferentLength.compareToUL        2       8  avgt    9  10.466 ± 0.568  ms/op
StringCompareToDifferentLength.compareToUL        2      15  avgt    9  14.092 ± 0.331  ms/op
StringCompareToDifferentLength.compareToUL        2      24  avgt    9  15.518 ± 0.314  ms/op
StringCompareToDifferentLength.compareToUL        2      36  avgt    9  19.325 ± 0.452  ms/op
StringCompareToDifferentLength.compareToUU        2       7  avgt    9   7.422 ± 0.565  ms/op
StringCompareToDifferentLength.compareToUU        2       8  avgt    9   6.409 ± 0.748  ms/op
StringCompareToDifferentLength.compareToUU        2      15  avgt    9   8.357 ± 0.983  ms/op
StringCompareToDifferentLength.compareToUU        2      24  avgt    9   9.453 ± 0.911  ms/op
StringCompareToDifferentLength.compareToUU        2      36  avgt    9  11.561 ± 0.745  ms/op


Clear performance degradation when we have to go into TAIL ( cases 15, 36 for LL. 7 and 15 for LU/UL)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14534#issuecomment-1607389045


More information about the hotspot-dev mailing list