RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic [v2]
Yuri Gaevsky
duke at openjdk.org
Fri May 16 08:22:04 UTC 2025
On Thu, 24 Apr 2025 17:27:39 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:
>> Hello All,
>>
>> Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported.
>>
>> Thank you,
>> -Yuri Gaevsky
>>
>> **Correctness checks:**
>> hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4.
>
> Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>
> - Merge master
> - 8324124: RISC-V: implement _vectorizedMismatch intrinsic
More fine-grained data on same hardware as above:
Legend: UseVHI ==> UseVectorizedMismatchIntrinsic
---------------------------------------------------------------------------------------------
(baseline) (patch)
---------------------------------------------------------------------------------------------
|-XX:-UseVMI -XX:+UseRVV|-XX:+UseVMI -XX:+UseRVV]
---------------------------------------------------------------------------------------------
Benchmark (size) Mode Cnt Score Error Score Error Units
Byte.differentSubrangeMatches 100 avgt 10 229.256 ± 0.198 69.741 ± 0.275 ns/op
Byte.differentSubrangeMatches 200 avgt 10 201.288 ± 0.190 66.506 ± 0.067 ns/op
Byte.differentSubrangeMatches 300 avgt 10 536.125 ± 0.365 115.255 ± 1.541 ns/op
Byte.differentSubrangeMatches 400 avgt 10 556.112 ± 10.091 74.600 ± 0.364 ns/op
Byte.differentSubrangeMatches 500 avgt 10 870.934 ± 0.696 67.701 ± 0.044 ns/op
Byte.differentSubrangeMatches 600 avgt 10 425.375 ± 0.235 102.945 ± 0.337 ns/op
Byte.differentSubrangeMatches 700 avgt 10 1909.673 ± 53.585 103.073 ± 0.414 ns/op
Byte.differentSubrangeMatches 800 avgt 10 873.724 ± 0.232 102.817 ± 0.068 ns/op
Byte.matches 100 avgt 10 65.256 ± 0.048 95.762 ± 16.464 ns/op
Byte.matches 200 avgt 10 88.623 ± 0.149 97.243 ± 1.450 ns/op
Byte.matches 300 avgt 10 204.775 ± 4.687 92.739 ± 0.047 ns/op
Byte.matches 400 avgt 10 153.686 ± 0.278 141.477 ± 46.844 ns/op
Byte.matches 500 avgt 10 310.993 ± 5.080 91.547 ± 0.060 ns/op
Byte.matches 600 avgt 10 185.652 ± 0.102 124.798 ± 0.066 ns/op
Byte.matches 700 avgt 10 304.217 ± 124.206 124.744 ± 0.075 ns/op
Byte.matches 800 avgt 10 239.599 ± 0.110 158.000 ± 0.088 ns/op
Byte.mismatchEnd 100 avgt 10 60.562 ± 0.358 55.872 ± 0.357 ns/op
Byte.mismatchEnd 200 avgt 10 84.626 ± 0.038 56.092 ± 0.263 ns/op
Byte.mismatchEnd 300 avgt 10 114.080 ± 0.071 87.155 ± 0.062 ns/op
Byte.mismatchEnd 400 avgt 10 149.347 ± 0.254 87.151 ± 0.055 ns/op
Byte.mismatchEnd 500 avgt 10 180.149 ± 0.426 94.018 ± 0.050 ns/op
Byte.mismatchEnd 600 avgt 10 182.495 ± 0.103 122.891 ± 0.067 ns/op
Byte.mismatchEnd 700 avgt 10 222.248 ± 0.480 121.082 ± 0.082 ns/op
Byte.mismatchEnd 800 avgt 10 235.855 ± 0.338 154.226 ± 0.089 ns/op
Byte.mismatchMid 100 avgt 10 38.859 ± 0.122 55.244 ± 0.303 ns/op
Byte.mismatchMid 200 avgt 10 54.352 ± 0.517 55.070 ± 0.236 ns/op
Byte.mismatchMid 300 avgt 10 68.996 ± 0.057 69.280 ± 26.763 ns/op
Byte.mismatchMid 400 avgt 10 87.560 ± 0.389 61.302 ± 0.241 ns/op
Byte.mismatchMid 500 avgt 10 100.948 ± 0.061 55.135 ± 0.259 ns/op
Byte.mismatchMid 600 avgt 10 118.616 ± 0.130 87.158 ± 0.106 ns/op
Byte.mismatchMid 700 avgt 10 132.310 ± 0.112 92.138 ± 0.040 ns/op
Byte.mismatchMid 800 avgt 10 170.413 ± 62.978 90.976 ± 0.060 ns/op
Byte.mismatchStart 100 avgt 10 23.837 ± 0.034 55.365 ± 0.437 ns/op
Byte.mismatchStart 200 avgt 10 23.200 ± 0.012 55.399 ± 0.168 ns/op
Byte.mismatchStart 300 avgt 10 36.600 ± 0.763 54.907 ± 0.235 ns/op
Byte.mismatchStart 400 avgt 10 23.188 ± 0.010 54.653 ± 0.226 ns/op
Byte.mismatchStart 500 avgt 10 36.171 ± 0.325 55.461 ± 0.184 ns/op
Byte.mismatchStart 600 avgt 10 23.829 ± 0.048 55.495 ± 0.448 ns/op
Byte.mismatchStart 700 avgt 10 23.815 ± 0.007 56.140 ± 0.270 ns/op
Byte.mismatchStart 800 avgt 10 25.100 ± 0.034 53.920 ± 0.035 ns/op
---------------------------------------------------------------------------------------------
Short.differentSubrangeMatches 100 avgt 10 282.444 ± 0.153 70.040 ± 0.125 ns/op
Short.differentSubrangeMatches 200 avgt 10 135.741 ± 0.200 68.342 ± 0.057 ns/op
Short.differentSubrangeMatches 300 avgt 10 1277.808 ± 14.456 103.454 ± 0.120 ns/op
Short.differentSubrangeMatches 400 avgt 10 223.451 ± 0.293 171.933 ± 2.561 ns/op
Short.differentSubrangeMatches 500 avgt 10 2138.447 ± 25.749 171.097 ± 2.585 ns/op
Short.differentSubrangeMatches 600 avgt 10 314.485 ± 0.242 139.807 ± 0.079 ns/op
Short.differentSubrangeMatches 700 avgt 10 1754.537 ± 1.708 138.499 ± 0.027 ns/op
Short.differentSubrangeMatches 800 avgt 10 406.632 ± 0.264 171.009 ± 0.708 ns/op
Short.matches 100 avgt 10 84.960 ± 0.301 60.805 ± 0.017 ns/op
Short.matches 200 avgt 10 146.112 ± 0.165 100.379 ± 0.159 ns/op
Short.matches 300 avgt 10 179.960 ± 0.169 129.123 ± 0.070 ns/op
Short.matches 400 avgt 10 232.550 ± 0.407 162.544 ± 0.050 ns/op
Short.matches 500 avgt 10 284.438 ± 0.351 169.978 ± 0.114 ns/op
Short.matches 600 avgt 10 341.725 ± 0.382 198.988 ± 0.136 ns/op
Short.matches 700 avgt 10 394.701 ± 1.911 397.286 ± 4.721 ns/op
Short.matches 800 avgt 10 444.682 ± 0.399 260.794 ± 0.155 ns/op
Short.mismatchEnd 100 avgt 10 78.362 ± 0.026 54.080 ± 0.205 ns/op
Short.mismatchEnd 200 avgt 10 137.212 ± 0.102 86.555 ± 0.135 ns/op
Short.mismatchEnd 300 avgt 10 172.287 ± 0.125 120.342 ± 0.079 ns/op
Short.mismatchEnd 400 avgt 10 223.305 ± 0.159 153.056 ± 0.331 ns/op
Short.mismatchEnd 500 avgt 10 276.604 ± 0.122 153.631 ± 0.132 ns/op
Short.mismatchEnd 600 avgt 10 331.104 ± 1.805 186.172 ± 0.057 ns/op
Short.mismatchEnd 700 avgt 10 384.149 ± 1.776 225.087 ± 0.171 ns/op
Short.mismatchEnd 800 avgt 10 437.838 ± 0.401 257.156 ± 0.221 ns/op
Short.mismatchMid 100 avgt 10 51.411 ± 0.009 90.411 ± 0.519 ns/op
Short.mismatchMid 200 avgt 10 83.834 ± 0.685 90.383 ± 1.926 ns/op
Short.mismatchMid 300 avgt 10 112.123 ± 0.822 86.532 ± 0.033 ns/op
Short.mismatchMid 400 avgt 10 141.735 ± 0.667 90.890 ± 0.067 ns/op
Short.mismatchMid 500 avgt 10 169.345 ± 0.166 144.730 ± 3.164 ns/op
Short.mismatchMid 600 avgt 10 173.554 ± 0.153 123.545 ± 0.122 ns/op
Short.mismatchMid 700 avgt 10 194.393 ± 0.262 119.735 ± 0.054 ns/op
Short.mismatchMid 800 avgt 10 222.163 ± 0.390 158.130 ± 0.087 ns/op
Short.mismatchStart 100 avgt 10 22.566 ± 0.007 54.399 ± 0.293 ns/op
Short.mismatchStart 200 avgt 10 25.699 ± 0.011 54.923 ± 0.127 ns/op
Short.mismatchStart 300 avgt 10 25.702 ± 0.014 54.097 ± 0.254 ns/op
Short.mismatchStart 400 avgt 10 23.822 ± 0.009 99.891 ± 0.788 ns/op
Short.mismatchStart 500 avgt 10 25.721 ± 0.022 54.128 ± 0.266 ns/op
Short.mismatchStart 600 avgt 10 25.704 ± 0.007 53.493 ± 0.203 ns/op
Short.mismatchStart 700 avgt 10 40.538 ± 0.413 54.189 ± 0.184 ns/op
Short.mismatchStart 800 avgt 10 25.706 ± 0.015 57.905 ± 0.243 ns/op
---------------------------------------------------------------------------------------------
Int.differentSubrangeMatches 100 avgt 10 154.976 ± 0.633 69.036 ± 0.070 ns/op
Int.differentSubrangeMatches 200 avgt 10 156.642 ± 0.483 101.957 ± 0.316 ns/op
Int.differentSubrangeMatches 300 avgt 10 394.074 ± 1.545 138.248 ± 0.613 ns/op
Int.differentSubrangeMatches 400 avgt 10 241.063 ± 0.464 170.497 ± 0.551 ns/op
Int.differentSubrangeMatches 500 avgt 10 630.403 ± 0.356 172.495 ± 0.113 ns/op
Int.differentSubrangeMatches 600 avgt 10 356.048 ± 0.451 345.645 ± 5.213 ns/op
Int.differentSubrangeMatches 700 avgt 10 874.046 ± 7.070 239.370 ± 0.333 ns/op
Int.differentSubrangeMatches 800 avgt 10 767.832 ± 14.872 275.932 ± 0.195 ns/op
Int.matches 100 avgt 10 229.800 ± 67.886 99.062 ± 0.071 ns/op
Int.matches 200 avgt 10 228.766 ± 0.156 160.609 ± 0.131 ns/op
Int.matches 300 avgt 10 337.579 ± 0.437 198.166 ± 0.145 ns/op
Int.matches 400 avgt 10 442.784 ± 2.386 264.596 ± 0.155 ns/op
Int.matches 500 avgt 10 902.691 ± 10.068 511.427 ± 6.414 ns/op
Int.matches 600 avgt 10 626.352 ± 0.474 615.184 ± 6.823 ns/op
Int.matches 700 avgt 10 733.358 ± 0.209 397.746 ± 0.280 ns/op
Int.matches 800 avgt 10 835.602 ± 0.761 464.160 ± 0.309 ns/op
Int.mismatchEnd 100 avgt 10 133.112 ± 0.202 86.483 ± 0.034 ns/op
Int.mismatchEnd 200 avgt 10 351.115 ± 69.363 153.029 ± 0.104 ns/op
Int.mismatchEnd 300 avgt 10 326.137 ± 0.202 189.315 ± 0.091 ns/op
Int.mismatchEnd 400 avgt 10 430.870 ± 0.388 346.663 ± 142.707 ns/op
Int.mismatchEnd 500 avgt 10 512.747 ± 0.476 296.076 ± 0.153 ns/op
Int.mismatchEnd 600 avgt 10 614.686 ± 0.501 352.447 ± 0.220 ns/op
Int.mismatchEnd 700 avgt 10 718.262 ± 0.872 390.380 ± 0.385 ns/op
Int.mismatchEnd 800 avgt 10 823.137 ± 1.092 456.725 ± 0.133 ns/op
Int.mismatchMid 100 avgt 10 87.525 ± 0.436 53.380 ± 0.099 ns/op
Int.mismatchMid 200 avgt 10 140.749 ± 0.345 90.916 ± 0.067 ns/op
Int.mismatchMid 300 avgt 10 173.054 ± 0.186 123.485 ± 0.056 ns/op
Int.mismatchMid 400 avgt 10 222.213 ± 0.251 152.981 ± 0.104 ns/op
Int.mismatchMid 500 avgt 10 274.683 ± 0.206 259.329 ± 2.498 ns/op
Int.mismatchMid 600 avgt 10 325.381 ± 0.138 197.549 ± 0.093 ns/op
Int.mismatchMid 700 avgt 10 373.725 ± 0.333 223.880 ± 0.116 ns/op
Int.mismatchMid 800 avgt 10 419.002 ± 0.539 252.679 ± 0.126 ns/op
Int.mismatchStart 100 avgt 10 29.408 ± 0.101 53.512 ± 0.507 ns/op
Int.mismatchStart 200 avgt 10 29.469 ± 2.875 89.489 ± 2.456 ns/op
Int.mismatchStart 300 avgt 10 28.874 ± 0.136 89.833 ± 2.523 ns/op
Int.mismatchStart 400 avgt 10 28.853 ± 0.020 96.711 ± 1.345 ns/op
Int.mismatchStart 500 avgt 10 28.830 ± 0.009 53.488 ± 0.177 ns/op
Int.mismatchStart 600 avgt 10 28.839 ± 0.028 53.415 ± 0.118 ns/op
Int.mismatchStart 700 avgt 10 28.835 ± 0.019 54.031 ± 0.143 ns/op
Int.mismatchStart 800 avgt 10 28.858 ± 0.108 75.214 ± 28.338 ns/op
---------------------------------------------------------------------------------------------
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2885999588
More information about the hotspot-dev
mailing list