RFR: 8334554: RISC-V: verify & fix perf of string comparison

Hamlin Li mli at openjdk.org
Mon Jun 24 14:46:13 UTC 2024


On Mon, 24 Jun 2024 14:40:10 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2330:
>> 
>>> 2328: void C2_MacroAssembler::element_compare(Register a1, Register a2, Register result, Register cnt, Register tmp1, Register tmp2,
>>> 2329:                                         VectorRegister vr1, VectorRegister vr2, VectorRegister vrs, bool islatin, Label &DONE,
>>> 2330:                                         bool is_m2) {
>> 
>> How about add a `Assembler::LMUL LMUL` param instead? And, should we pass a larger `Assembler::m4` only for vlen=128 case (that is when `MaxVectorSize` is 16)? As I mentioned on [[1]](https://github.com/openjdk/jdk/pull/18382#discussion_r1645356197), a LMUL larger than needed can sometimes even bring a negative impact on performance on hardwares like banana-pi (vlen=256), which is kind of strange to me.
>> 
>> Performance impact on banana-pi (vlen=256):
>> Before:
>> 
>> Benchmark                                   (delta)  (size)  Mode  Cnt      Score      Error  Units
>> StringCompareToDifferentLength.compareToLL        2      24  avgt    9   4556.938 ±  909.960  us/op
>> StringCompareToDifferentLength.compareToLL        2      36  avgt    9   4613.250 ±  891.120  us/op
>> StringCompareToDifferentLength.compareToLL        2      72  avgt    9   5792.938 ±  545.470  us/op
>> StringCompareToDifferentLength.compareToLL        2     128  avgt    9   5884.248 ± 1089.558  us/op
>> StringCompareToDifferentLength.compareToLL        2     256  avgt    9   8506.465 ±  197.376  us/op
>> StringCompareToDifferentLength.compareToLL        2     512  avgt    9  14349.963 ±  253.898  us/op
>> StringCompareToDifferentLength.compareToLU        2      24  avgt    9   6084.199 ± 5148.464  us/op
>> StringCompareToDifferentLength.compareToLU        2      36  avgt    9   5194.196 ±  927.611  us/op
>> StringCompareToDifferentLength.compareToLU        2      72  avgt    9   7332.861 ±  909.214  us/op
>> StringCompareToDifferentLength.compareToLU        2     128  avgt    9   7043.723 ±  159.843  us/op
>> StringCompareToDifferentLength.compareToLU        2     256  avgt    9  11718.996 ±  552.570  us/op
>> StringCompareToDifferentLength.compareToLU        2     512  avgt    9  20471.987 ±  314.224  us/op
>> StringCompareToDifferentLength.compareToUL        2      24  avgt    9   5371.997 ± 1002.623  us/op
>> StringCompareToDifferentLength.compareToUL        2      36  avgt    9   5469.605 ± 1119.210  us/op
>> StringCompareToDifferentLength.compareToUL        2      72  avgt    9   7249.683 ±  154.028 ...
>
> Seems the `Error` column is huge for tests `compareToLL`.

This is not a good news us (riscv), as we need to adjust the lmul for specific intrinsic, I'm not sure if different boards with same vector length  will have impact on the selection of lmul value.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19825#discussion_r1651162840


More information about the hotspot-compiler-dev mailing list