RFR: 8334554: RISC-V: verify & fix perf of string comparison

Fei Yang fyang at openjdk.org
Mon Jun 24 07:26:12 UTC 2024


On Fri, 21 Jun 2024 10:33:40 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this patch?
> Thanks!
> 
> As in compare-UL/LU, it already uses m4, so in this patch also use m4 for compare-UU/LL.
> 
> ## Test
> tested on K230-CanMV, vlen = 128.
> warmup: 10 times
> iteration: 10 times
> 
> ### Before patch
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark | (size) | Score+rvv | Score-rvv | -rvv/+rvv
> -- | -- | -- | -- | --
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLL | 24 | 4242936.876 | 7227607.14 | 1.703444419
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLL | 36 | 5738695.363 | 8157070.353 | 1.421415468
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLL | 72 | 7163243.984 | 7209568.036 | 1.00646691
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLL | 128 | 8627566.301 | 12720927.51 | 1.474451435
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLL | 256 | 14632020.04 | 16291127.26 | 1.113388802
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLL | 512 | 26539410.59 | 23612505.95 | 0.8897147833
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 24 | 4913490.894 | 10454585.94 | 2.127730807
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 36 | 7230036.286 | 13561865.48 | 1.875767277
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 72 | 9525418.104 | 21901656.51 | 2.299285582
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 128 | 12645301.4 | 37351484.04 | 2.953783611
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 256 | 21147886.68 | 64886475.43 | 3.068225039
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 512 | 39738017.94 | 125169103.6 | 3.149857745
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToUL | 24 | 5183884.427 | 11040441.7 | 2.129762314
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToUL | 36 | 7421224.1 | 13879329.16 | 1.870221
> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToUL | 72 | 9739241.916 | 22346979.93 | 2.29452971
> com.arm.benchmarks.intrinsics.Stri...

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2330:

> 2328: void C2_MacroAssembler::element_compare(Register a1, Register a2, Register result, Register cnt, Register tmp1, Register tmp2,
> 2329:                                         VectorRegister vr1, VectorRegister vr2, VectorRegister vrs, bool islatin, Label &DONE,
> 2330:                                         bool is_m2) {

How about add a `Assembler::LMUL LMUL` param instead? And, should we pass a larger `Assembler::m4` only for vlen=128 case (that is when `MaxVectorSize` is 16)? As I mentioned on [[1]](https://github.com/openjdk/jdk/pull/18382#discussion_r1645356197), a LMUL larger than needed can sometimes even bring a negative impact on performance on hardwares like banana-pi (vlen=256), which is kind of strange to me.

Performance impact on banana-pi (vlen=256):
Before:

Benchmark                                   (delta)  (size)  Mode  Cnt      Score      Error  Units
StringCompareToDifferentLength.compareToLL        2      24  avgt    9   4556.938 ±  909.960  us/op
StringCompareToDifferentLength.compareToLL        2      36  avgt    9   4613.250 ±  891.120  us/op
StringCompareToDifferentLength.compareToLL        2      72  avgt    9   5792.938 ±  545.470  us/op
StringCompareToDifferentLength.compareToLL        2     128  avgt    9   5884.248 ± 1089.558  us/op
StringCompareToDifferentLength.compareToLL        2     256  avgt    9   8506.465 ±  197.376  us/op
StringCompareToDifferentLength.compareToLL        2     512  avgt    9  14349.963 ±  253.898  us/op
StringCompareToDifferentLength.compareToLU        2      24  avgt    9   6084.199 ± 5148.464  us/op
StringCompareToDifferentLength.compareToLU        2      36  avgt    9   5194.196 ±  927.611  us/op
StringCompareToDifferentLength.compareToLU        2      72  avgt    9   7332.861 ±  909.214  us/op
StringCompareToDifferentLength.compareToLU        2     128  avgt    9   7043.723 ±  159.843  us/op
StringCompareToDifferentLength.compareToLU        2     256  avgt    9  11718.996 ±  552.570  us/op
StringCompareToDifferentLength.compareToLU        2     512  avgt    9  20471.987 ±  314.224  us/op
StringCompareToDifferentLength.compareToUL        2      24  avgt    9   5371.997 ± 1002.623  us/op
StringCompareToDifferentLength.compareToUL        2      36  avgt    9   5469.605 ± 1119.210  us/op
StringCompareToDifferentLength.compareToUL        2      72  avgt    9   7249.683 ±  154.028  us/op
StringCompareToDifferentLength.compareToUL        2     128  avgt    9   7246.081 ±  100.914  us/op
StringCompareToDifferentLength.compareToUL        2     256  avgt    9  11832.147 ±  316.674  us/op
StringCompareToDifferentLength.compareToUL        2     512  avgt    9  20682.433 ±  308.106  us/op
StringCompareToDifferentLength.compareToUU        2      24  avgt    9   4655.008 ±  829.460  us/op
StringCompareToDifferentLength.compareToUU        2      36  avgt    9   6312.919 ±  900.926  us/op
StringCompareToDifferentLength.compareToUU        2      72  avgt    9   7638.259 ±  998.933  us/op
StringCompareToDifferentLength.compareToUU        2     128  avgt    9   9121.624 ±  142.066  us/op
StringCompareToDifferentLength.compareToUU        2     256  avgt    9  15420.062 ±  326.247  us/op
StringCompareToDifferentLength.compareToUU        2     512  avgt    9  27808.510 ±   68.788  us/op


After:

Benchmark                                   (delta)  (size)  Mode  Cnt      Score       Error  Units
StringCompareToDifferentLength.compareToLL        2      24  avgt    9   7642.107 ±  4901.979  us/op
StringCompareToDifferentLength.compareToLL        2      36  avgt    9   7696.503 ±  6510.610  us/op
StringCompareToDifferentLength.compareToLL        2      72  avgt    9  12322.284 ± 12789.419  us/op
StringCompareToDifferentLength.compareToLL        2     128  avgt    9  10534.939 ±  2275.612  us/op
StringCompareToDifferentLength.compareToLL        2     256  avgt    9  10090.611 ±  3167.551  us/op
StringCompareToDifferentLength.compareToLL        2     512  avgt    9  13326.976 ±  4893.826  us/op
StringCompareToDifferentLength.compareToLU        2      24  avgt    9   5327.304 ±  1189.457  us/op
StringCompareToDifferentLength.compareToLU        2      36  avgt    9   5185.305 ±   890.695  us/op
StringCompareToDifferentLength.compareToLU        2      72  avgt    9   7091.095 ±   133.802  us/op
StringCompareToDifferentLength.compareToLU        2     128  avgt    9   7249.605 ±   470.037  us/op
StringCompareToDifferentLength.compareToLU        2     256  avgt    9  11597.915 ±   286.748  us/op
StringCompareToDifferentLength.compareToLU        2     512  avgt    9  20350.769 ±   416.088  us/op
StringCompareToDifferentLength.compareToUL        2      24  avgt    9   5455.452 ±   955.767  us/op
StringCompareToDifferentLength.compareToUL        2      36  avgt    9   5223.355 ±   750.944  us/op
StringCompareToDifferentLength.compareToUL        2      72  avgt    9   7236.984 ±   168.883  us/op
StringCompareToDifferentLength.compareToUL        2     128  avgt    9   7247.966 ±   117.412  us/op
StringCompareToDifferentLength.compareToUL        2     256  avgt    9  11888.766 ±   432.701  us/op
StringCompareToDifferentLength.compareToUL        2     512  avgt    9  20784.217 ±   174.492  us/op
StringCompareToDifferentLength.compareToUU        2      24  avgt    9   5240.293 ±   786.652  us/op
StringCompareToDifferentLength.compareToUU        2      36  avgt    9   5152.938 ±   548.116  us/op
StringCompareToDifferentLength.compareToUU        2      72  avgt    9   7263.508 ±   709.897  us/op
StringCompareToDifferentLength.compareToUU        2     128  avgt    9   7357.423 ±   144.770  us/op
StringCompareToDifferentLength.compareToUU        2     256  avgt    9  11609.226 ±   258.765  us/op
StringCompareToDifferentLength.compareToUU        2     512  avgt    9  20333.481 ±   278.653  us/op

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19825#discussion_r1650420232


More information about the hotspot-compiler-dev mailing list