RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic [v5]
Yuri Gaevsky
duke at openjdk.org
Tue Aug 12 12:35:13 UTC 2025
On Tue, 12 Aug 2025 05:58:16 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:
>> Hello All,
>>
>> Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported.
>>
>> Thank you,
>> -Yuri Gaevsky
>>
>> **Correctness checks:**
>> hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4.
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
>
> - moved vector register declarations into vectorized_mismatch() body.
> - added initial support for ArrayOperationPartialInlineSize.
Changeset `6341020` (BPI-F3 16g):
=============================================================================================================================
| -UseRVV | +UseRVV (m1) | +UseRVV (m2) | +UseRVV (m4) | +UseRVV (m8) |
=============================================================================================================================
Int.mismatchEnd 5 avgt 30 13.176 ± 0.016 13.154 ± 0.003 13.185 ± 0.018 13.157 ± 0.005 13.155 ± 0.003 ns/op
Int.mismatchEnd 8 avgt 30 29.410 ± 0.272 32.821 ± 0.362 35.331 ± 0.383 41.645 ± 0.153 55.595 ± 1.066 ns/op
Int.mismatchEnd 16 avgt 30 37.623 ± 0.421 46.344 ± 0.006 35.368 ± 0.358 41.687 ± 0.148 54.483 ± 0.007 ns/op
Int.mismatchEnd 32 avgt 30 57.399 ± 0.444 72.661 ± 0.024 50.619 ± 0.808 41.764 ± 0.192 54.265 ± 0.161 ns/op
Int.mismatchEnd 40 avgt 30 66.177 ± 0.279 85.869 ± 0.048 64.599 ± 1.164 62.919 ± 0.702 53.939 ± 0.356 ns/op
Int.mismatchEnd 50 avgt 30 75.398 ± 0.198 95.191 ± 0.014 65.218 ± 0.077 62.627 ± 0.013 53.862 ± 0.008 ns/op
Int.mismatchEnd 64 avgt 30 95.176 ± 0.558 121.951 ± 3.239 81.017 ± 1.581 63.682 ± 0.310 53.703 ± 0.379 ns/op
-----------------------------------------------------------------------------------------------------------------------------
Int.mismatchMid 5 avgt 30 28.893 ± 0.079 33.316 ± 0.075 36.333 ± 0.009 42.867 ± 0.963 59.605 ± 0.106 ns/op
Int.mismatchMid 8 avgt 30 25.275 ± 0.154 33.226 ± 0.026 36.343 ± 0.011 43.961 ± 0.040 58.947 ± 0.645 ns/op
Int.mismatchMid 16 avgt 30 34.135 ± 0.560 46.345 ± 0.007 34.921 ± 0.447 41.443 ± 0.307 55.132 ± 1.252 ns/op
Int.mismatchMid 32 avgt 30 42.832 ± 0.156 59.564 ± 0.064 51.378 ± 0.008 42.631 ± 0.646 54.488 ± 0.012 ns/op
Int.mismatchMid 40 avgt 30 47.237 ± 0.233 58.246 ± 1.199 50.526 ± 0.809 41.827 ± 0.127 54.509 ± 0.026 ns/op
Int.mismatchMid 50 avgt 30 52.497 ± 0.208 72.730 ± 0.065 50.539 ± 0.803 41.027 ± 0.228 54.484 ± 0.010 ns/op
Int.mismatchMid 64 avgt 30 62.040 ± 0.316 85.900 ± 0.070 65.797 ± 1.179 63.050 ± 0.796 54.285 ± 0.187 ns/op
-----------------------------------------------------------------------------------------------------------------------------
Int.mismatchStart 5 avgt 30 29.101 ± 0.176 32.602 ± 0.019 35.125 ± 0.031 42.606 ± 0.612 58.884 ± 0.031 ns/op
Int.mismatchStart 8 avgt 30 28.884 ± 0.071 33.223 ± 0.018 35.740 ± 0.033 43.837 ± 0.006 58.016 ± 1.691 ns/op
Int.mismatchStart 16 avgt 30 28.845 ± 0.044 33.205 ± 0.010 36.400 ± 0.040 42.792 ± 1.004 59.635 ± 0.100 ns/op
Int.mismatchStart 32 avgt 30 28.852 ± 0.082 32.790 ± 0.397 35.706 ± 0.010 41.960 ± 0.006 54.334 ± 0.144 ns/op
Int.mismatchStart 40 avgt 30 29.137 ± 0.295 32.787 ± 0.392 35.294 ± 0.404 41.956 ± 0.011 55.855 ± 1.753 ns/op
Int.mismatchStart 50 avgt 30 29.543 ± 0.359 32.611 ± 0.034 35.144 ± 0.076 41.391 ± 0.059 55.545 ± 1.604 ns/op
Int.mismatchStart 64 avgt 30 28.988 ± 0.112 32.786 ± 0.403 35.748 ± 0.053 43.089 ± 0.773 54.114 ± 0.192 ns/op
=============================================================================================================================
Byte.mismatchEnd 5 avgt 30 17.567 ± 0.043 18.193 ± 0.027 17.959 ± 0.199 18.221 ± 0.035 17.870 ± 0.184 ns/op
Byte.mismatchEnd 8 avgt 30 22.356 ± 0.390 35.551 ± 1.248 38.292 ± 1.557 44.816 ± 1.372 63.172 ± 0.025 ns/op
Byte.mismatchEnd 16 avgt 30 29.471 ± 0.396 35.766 ± 1.125 38.635 ± 1.098 47.061 ± 1.024 63.978 ± 0.115 ns/op
Byte.mismatchEnd 32 avgt 30 35.606 ± 0.352 35.006 ± 1.269 38.867 ± 1.013 44.779 ± 1.354 58.762 ± 0.061 ns/op
Byte.mismatchEnd 40 avgt 30 36.299 ± 0.233 49.543 ± 0.068 39.384 ± 0.069 45.629 ± 0.052 58.155 ± 0.058 ns/op
Byte.mismatchEnd 50 avgt 30 40.831 ± 0.400 48.221 ± 1.205 38.515 ± 1.186 44.252 ± 1.380 59.907 ± 1.547 ns/op
Byte.mismatchEnd 64 avgt 30 50.740 ± 0.269 49.471 ± 1.201 38.981 ± 1.240 43.970 ± 1.056 57.533 ± 1.517 ns/op
-----------------------------------------------------------------------------------------------------------------------------
Byte.mismatchMid 5 avgt 30 30.346 ± 1.820 28.527 ± 1.364 28.992 ± 1.322 28.991 ± 1.782 29.945 ± 1.755 ns/op
Byte.mismatchMid 8 avgt 30 21.933 ± 0.023 36.217 ± 1.245 37.457 ± 1.231 48.120 ± 0.046 63.750 ± 0.021 ns/op
Byte.mismatchMid 16 avgt 30 29.273 ± 0.146 35.789 ± 1.262 38.637 ± 1.210 44.467 ± 1.936 60.595 ± 3.065 ns/op
Byte.mismatchMid 32 avgt 30 27.632 ± 0.558 36.155 ± 1.253 39.978 ± 0.541 46.168 ± 1.093 60.817 ± 1.509 ns/op
Byte.mismatchMid 40 avgt 30 28.596 ± 0.400 37.000 ± 0.404 38.030 ± 1.664 47.003 ± 1.031 62.470 ± 1.401 ns/op
Byte.mismatchMid 50 avgt 30 35.171 ± 0.468 37.034 ± 0.428 39.550 ± 0.404 45.030 ± 0.958 57.389 ± 0.989 ns/op
Byte.mismatchMid 64 avgt 30 35.879 ± 0.672 49.512 ± 1.219 39.994 ± 0.047 44.351 ± 1.392 57.130 ± 2.729 ns/op
-----------------------------------------------------------------------------------------------------------------------------
Byte.mismatchStart 5 avgt 30 29.944 ± 1.876 30.086 ± 1.849 31.777 ± 0.808 30.560 ± 2.040 29.034 ± 1.821 ns/op
Byte.mismatchStart 8 avgt 30 22.159 ± 0.219 34.091 ± 1.294 38.037 ± 1.270 44.722 ± 1.341 58.877 ± 2.451 ns/op
Byte.mismatchStart 16 avgt 30 21.942 ± 0.021 34.797 ± 1.349 38.725 ± 1.277 46.050 ± 0.997 58.949 ± 0.664 ns/op
Byte.mismatchStart 32 avgt 30 22.166 ± 0.188 33.611 ± 1.230 40.017 ± 0.084 47.512 ± 0.643 63.875 ± 0.039 ns/op
Byte.mismatchStart 40 avgt 30 21.936 ± 0.013 34.460 ± 1.151 38.701 ± 1.280 46.868 ± 0.605 63.365 ± 0.120 ns/op
Byte.mismatchStart 50 avgt 30 22.136 ± 0.198 34.251 ± 1.299 37.158 ± 1.409 44.893 ± 1.654 61.724 ± 1.344 ns/op
Byte.mismatchStart 64 avgt 30 21.953 ± 0.037 34.875 ± 1.244 39.218 ± 1.389 48.289 ± 0.062 61.671 ± 2.012 ns/op
=============================================================================================================================
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-3179138714
More information about the hotspot-dev
mailing list