RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic [v6]

Wed Aug 20 10:41:44 UTC 2025

On Tue, 12 Aug 2025 19:54:14 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

>> Hello All,
>> 
>> Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported.
>> 
>> Thank you,
>> -Yuri Gaevsky
>> 
>> **Correctness checks:**
>>   hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4.
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
> 
>   removed unneeded check for zero length; changed lmul from m8 to m2.

More performance data for m2 vs m4 vs m8:

==========================================================================================================================
                                                    -XX:-UseRVV   -XX:+UseRVV(m2)  -XX:+UseRVV(m4)  -XX:+UseRVV(m8)
==========================================================================================================================
Benchmark                     (size)  Mode  Cnt    Score   Error    Score   Error    Score   Error    Score   Error  Units
Int.differentSubrangeMatches     100  avgt   10  137.172 ± 0.054   98.497 ± 0.310   79.800 ± 0.279   69.906 ± 0.268  ns/op
Int.differentSubrangeMatches     200  avgt   10  156.312 ± 0.281  140.852 ± 0.361  118.496 ± 1.082  103.428 ± 0.425  ns/op
Int.differentSubrangeMatches     300  avgt   10  327.659 ± 0.317  191.959 ± 0.440  148.588 ± 1.106  138.767 ± 0.635  ns/op
Int.differentSubrangeMatches     400  avgt   10  240.912 ± 0.429  230.264 ± 0.164  179.730 ± 0.306  170.405 ± 0.312  ns/op
Int.differentSubrangeMatches     500  avgt   10  523.581 ± 0.292  286.112 ± 0.307  210.887 ± 0.311  172.616 ± 0.517  ns/op
Int.differentSubrangeMatches     600  avgt   10  352.296 ± 0.480  322.362 ± 0.924  249.807 ± 0.290  201.274 ± 0.633  ns/op
Int.differentSubrangeMatches     700  avgt   10  725.652 ± 0.555  382.037 ± 0.434  278.503 ± 0.633  245.203 ± 0.391  ns/op
Int.differentSubrangeMatches     800  avgt   10  455.651 ± 1.003  412.241 ± 0.411  312.572 ± 0.475  271.538 ± 0.319  ns/op
--------------------------------------------------------------------------------------------------------------------------
Int.matches                      100  avgt   10  143.116 ± 0.627  128.433 ± 0.057  110.221 ± 0.056   95.322 ± 0.049  ns/op
Int.matches                      200  avgt   10  227.868 ± 0.190  231.481 ± 0.343  172.225 ± 0.052  160.328 ± 0.019  ns/op
Int.matches                      300  avgt   10  336.983 ± 0.094  301.416 ± 0.279  234.191 ± 0.036  199.844 ± 0.224  ns/op
Int.matches                      400  avgt   10  440.492 ± 0.503  389.587 ± 0.752  312.521 ± 0.103  259.867 ± 0.030  ns/op
Int.matches                      500  avgt   10  524.292 ± 0.828  490.197 ± 1.283  362.972 ± 0.847  297.545 ± 0.140  ns/op
Int.matches                      600  avgt   10  627.717 ± 0.880  577.573 ± 0.764  420.304 ± 0.086  361.774 ± 0.720  ns/op
Int.matches                      700  avgt   10  730.503 ± 0.281  719.430 ± 0.278  487.603 ± 2.297  397.502 ± 0.467  ns/op
Int.matches                      800  avgt   10  831.331 ± 0.446  810.678 ± 0.482  580.438 ± 0.966  472.532 ± 0.484  ns/op
--------------------------------------------------------------------------------------------------------------------------
Int.mismatchEnd                  100  avgt   10  133.878 ± 0.434  106.791 ± 0.056   82.681 ± 0.070   86.513 ± 0.050  ns/op
Int.mismatchEnd                  200  avgt   10  220.972 ± 1.055  223.622 ± 0.110  170.348 ± 0.415  159.113 ± 0.195  ns/op
Int.mismatchEnd                  300  avgt   10  326.363 ± 0.069  294.368 ± 0.076  230.101 ± 0.934  190.400 ± 0.042  ns/op
Int.mismatchEnd                  400  avgt   10  432.284 ± 0.311  380.235 ± 0.096  288.662 ± 0.049  252.551 ± 0.123  ns/op
Int.mismatchEnd                  500  avgt   10  512.964 ± 0.139  466.615 ± 0.135  370.821 ± 0.071  285.593 ± 0.080  ns/op
Int.mismatchEnd                  600  avgt   10  613.120 ± 0.291  568.137 ± 0.120  414.635 ± 0.074  356.367 ± 0.084  ns/op
Int.mismatchEnd                  700  avgt   10  716.861 ± 0.667  709.291 ± 0.571  476.794 ± 0.399  404.384 ± 0.494  ns/op
Int.mismatchEnd                  800  avgt   10  821.902 ± 0.564  740.929 ± 0.241  542.111 ± 1.102  456.066 ± 0.119  ns/op
--------------------------------------------------------------------------------------------------------------------------
Int.mismatchMid                  100  avgt   10   84.289 ± 0.221   77.660 ± 0.018   63.073 ± 0.145   53.226 ± 0.010  ns/op
Int.mismatchMid                  200  avgt   10  142.339 ± 0.228  120.884 ± 0.037  102.706 ± 0.020   86.411 ± 0.014  ns/op
Int.mismatchMid                  300  avgt   10  170.238 ± 0.248  164.259 ± 0.457  127.066 ± 0.624  120.582 ± 0.147  ns/op
Int.mismatchMid                  400  avgt   10  221.964 ± 0.555  207.503 ± 0.126  164.960 ± 0.069  152.796 ± 0.026  ns/op
Int.mismatchMid                  500  avgt   10  275.343 ± 0.848  252.248 ± 0.395  187.275 ± 0.017  157.191 ± 0.032  ns/op
Int.mismatchMid                  600  avgt   10  322.031 ± 0.173  317.887 ± 0.314  226.728 ± 0.052  186.001 ± 0.040  ns/op
Int.mismatchMid                  700  avgt   10  371.653 ± 0.259  337.068 ± 0.069  247.348 ± 0.034  219.186 ± 0.051  ns/op
Int.mismatchMid                  800  avgt   10  419.094 ± 0.087  394.663 ± 0.231  288.703 ± 0.094  252.383 ± 0.078  ns/op
--------------------------------------------------------------------------------------------------------------------------
Int.mismatchStart                100  avgt   10   28.920 ± 0.179   34.449 ± 0.015   40.712 ± 0.007   53.854 ± 0.011  ns/op
Int.mismatchStart                200  avgt   10   28.845 ± 0.051   35.706 ± 0.022   40.710 ± 0.008   53.853 ± 0.007  ns/op
Int.mismatchStart                300  avgt   10   28.928 ± 0.051   34.444 ± 0.008   40.704 ± 0.017   53.234 ± 0.011  ns/op
Int.mismatchStart                400  avgt   10   29.369 ± 0.127   35.698 ± 0.008   40.702 ± 0.005   53.226 ± 0.011  ns/op
Int.mismatchStart                500  avgt   10   29.953 ± 0.595   34.488 ± 0.045   40.709 ± 0.017   54.288 ± 0.837  ns/op
Int.mismatchStart                600  avgt   10   28.809 ± 0.008   34.459 ± 0.011   41.957 ± 0.008   53.232 ± 0.009  ns/op
Int.mismatchStart                700  avgt   10   28.930 ± 0.124   35.702 ± 0.009   40.984 ± 0.092   54.495 ± 0.008  ns/op
Int.mismatchStart                800  avgt   10   28.814 ± 0.017   35.697 ± 0.009   40.711 ± 0.012   54.487 ± 0.013  ns/op
==========================================================================================================================

I would say that m4 looks more or less better in general for `sizes >=100` than m2 and/or m8. WDYT?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-3205509437