RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic [v6]
Yuri Gaevsky
duke at openjdk.org
Wed Aug 20 10:41:44 UTC 2025
On Tue, 12 Aug 2025 19:54:14 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:
>> Hello All,
>>
>> Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported.
>>
>> Thank you,
>> -Yuri Gaevsky
>>
>> **Correctness checks:**
>> hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4.
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
>
> removed unneeded check for zero length; changed lmul from m8 to m2.
More performance data for m2 vs m4 vs m8:
==========================================================================================================================
-XX:-UseRVV -XX:+UseRVV(m2) -XX:+UseRVV(m4) -XX:+UseRVV(m8)
==========================================================================================================================
Benchmark (size) Mode Cnt Score Error Score Error Score Error Score Error Units
Int.differentSubrangeMatches 100 avgt 10 137.172 ± 0.054 98.497 ± 0.310 79.800 ± 0.279 69.906 ± 0.268 ns/op
Int.differentSubrangeMatches 200 avgt 10 156.312 ± 0.281 140.852 ± 0.361 118.496 ± 1.082 103.428 ± 0.425 ns/op
Int.differentSubrangeMatches 300 avgt 10 327.659 ± 0.317 191.959 ± 0.440 148.588 ± 1.106 138.767 ± 0.635 ns/op
Int.differentSubrangeMatches 400 avgt 10 240.912 ± 0.429 230.264 ± 0.164 179.730 ± 0.306 170.405 ± 0.312 ns/op
Int.differentSubrangeMatches 500 avgt 10 523.581 ± 0.292 286.112 ± 0.307 210.887 ± 0.311 172.616 ± 0.517 ns/op
Int.differentSubrangeMatches 600 avgt 10 352.296 ± 0.480 322.362 ± 0.924 249.807 ± 0.290 201.274 ± 0.633 ns/op
Int.differentSubrangeMatches 700 avgt 10 725.652 ± 0.555 382.037 ± 0.434 278.503 ± 0.633 245.203 ± 0.391 ns/op
Int.differentSubrangeMatches 800 avgt 10 455.651 ± 1.003 412.241 ± 0.411 312.572 ± 0.475 271.538 ± 0.319 ns/op
--------------------------------------------------------------------------------------------------------------------------
Int.matches 100 avgt 10 143.116 ± 0.627 128.433 ± 0.057 110.221 ± 0.056 95.322 ± 0.049 ns/op
Int.matches 200 avgt 10 227.868 ± 0.190 231.481 ± 0.343 172.225 ± 0.052 160.328 ± 0.019 ns/op
Int.matches 300 avgt 10 336.983 ± 0.094 301.416 ± 0.279 234.191 ± 0.036 199.844 ± 0.224 ns/op
Int.matches 400 avgt 10 440.492 ± 0.503 389.587 ± 0.752 312.521 ± 0.103 259.867 ± 0.030 ns/op
Int.matches 500 avgt 10 524.292 ± 0.828 490.197 ± 1.283 362.972 ± 0.847 297.545 ± 0.140 ns/op
Int.matches 600 avgt 10 627.717 ± 0.880 577.573 ± 0.764 420.304 ± 0.086 361.774 ± 0.720 ns/op
Int.matches 700 avgt 10 730.503 ± 0.281 719.430 ± 0.278 487.603 ± 2.297 397.502 ± 0.467 ns/op
Int.matches 800 avgt 10 831.331 ± 0.446 810.678 ± 0.482 580.438 ± 0.966 472.532 ± 0.484 ns/op
--------------------------------------------------------------------------------------------------------------------------
Int.mismatchEnd 100 avgt 10 133.878 ± 0.434 106.791 ± 0.056 82.681 ± 0.070 86.513 ± 0.050 ns/op
Int.mismatchEnd 200 avgt 10 220.972 ± 1.055 223.622 ± 0.110 170.348 ± 0.415 159.113 ± 0.195 ns/op
Int.mismatchEnd 300 avgt 10 326.363 ± 0.069 294.368 ± 0.076 230.101 ± 0.934 190.400 ± 0.042 ns/op
Int.mismatchEnd 400 avgt 10 432.284 ± 0.311 380.235 ± 0.096 288.662 ± 0.049 252.551 ± 0.123 ns/op
Int.mismatchEnd 500 avgt 10 512.964 ± 0.139 466.615 ± 0.135 370.821 ± 0.071 285.593 ± 0.080 ns/op
Int.mismatchEnd 600 avgt 10 613.120 ± 0.291 568.137 ± 0.120 414.635 ± 0.074 356.367 ± 0.084 ns/op
Int.mismatchEnd 700 avgt 10 716.861 ± 0.667 709.291 ± 0.571 476.794 ± 0.399 404.384 ± 0.494 ns/op
Int.mismatchEnd 800 avgt 10 821.902 ± 0.564 740.929 ± 0.241 542.111 ± 1.102 456.066 ± 0.119 ns/op
--------------------------------------------------------------------------------------------------------------------------
Int.mismatchMid 100 avgt 10 84.289 ± 0.221 77.660 ± 0.018 63.073 ± 0.145 53.226 ± 0.010 ns/op
Int.mismatchMid 200 avgt 10 142.339 ± 0.228 120.884 ± 0.037 102.706 ± 0.020 86.411 ± 0.014 ns/op
Int.mismatchMid 300 avgt 10 170.238 ± 0.248 164.259 ± 0.457 127.066 ± 0.624 120.582 ± 0.147 ns/op
Int.mismatchMid 400 avgt 10 221.964 ± 0.555 207.503 ± 0.126 164.960 ± 0.069 152.796 ± 0.026 ns/op
Int.mismatchMid 500 avgt 10 275.343 ± 0.848 252.248 ± 0.395 187.275 ± 0.017 157.191 ± 0.032 ns/op
Int.mismatchMid 600 avgt 10 322.031 ± 0.173 317.887 ± 0.314 226.728 ± 0.052 186.001 ± 0.040 ns/op
Int.mismatchMid 700 avgt 10 371.653 ± 0.259 337.068 ± 0.069 247.348 ± 0.034 219.186 ± 0.051 ns/op
Int.mismatchMid 800 avgt 10 419.094 ± 0.087 394.663 ± 0.231 288.703 ± 0.094 252.383 ± 0.078 ns/op
--------------------------------------------------------------------------------------------------------------------------
Int.mismatchStart 100 avgt 10 28.920 ± 0.179 34.449 ± 0.015 40.712 ± 0.007 53.854 ± 0.011 ns/op
Int.mismatchStart 200 avgt 10 28.845 ± 0.051 35.706 ± 0.022 40.710 ± 0.008 53.853 ± 0.007 ns/op
Int.mismatchStart 300 avgt 10 28.928 ± 0.051 34.444 ± 0.008 40.704 ± 0.017 53.234 ± 0.011 ns/op
Int.mismatchStart 400 avgt 10 29.369 ± 0.127 35.698 ± 0.008 40.702 ± 0.005 53.226 ± 0.011 ns/op
Int.mismatchStart 500 avgt 10 29.953 ± 0.595 34.488 ± 0.045 40.709 ± 0.017 54.288 ± 0.837 ns/op
Int.mismatchStart 600 avgt 10 28.809 ± 0.008 34.459 ± 0.011 41.957 ± 0.008 53.232 ± 0.009 ns/op
Int.mismatchStart 700 avgt 10 28.930 ± 0.124 35.702 ± 0.009 40.984 ± 0.092 54.495 ± 0.008 ns/op
Int.mismatchStart 800 avgt 10 28.814 ± 0.017 35.697 ± 0.009 40.711 ± 0.012 54.487 ± 0.013 ns/op
==========================================================================================================================
I would say that m4 looks more or less better in general for `sizes >=100` than m2 and/or m8. WDYT?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-3205509437
More information about the hotspot-dev
mailing list