RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic [v6]

Thu Aug 21 01:50:53 UTC 2025

On Wed, 20 Aug 2025 10:37:36 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

>> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   removed unneeded check for zero length; changed lmul from m8 to m2.
>
> More performance data for m2 vs m4 vs m8:
> 
> ==========================================================================================================================
>                                                     -XX:-UseRVV   -XX:+UseRVV(m2)  -XX:+UseRVV(m4)  -XX:+UseRVV(m8)
> ==========================================================================================================================
> Benchmark                     (size)  Mode  Cnt    Score   Error    Score   Error    Score   Error    Score   Error  Units
> Int.differentSubrangeMatches     100  avgt   10  137.172 ± 0.054   98.497 ± 0.310   79.800 ± 0.279   69.906 ± 0.268  ns/op
> Int.differentSubrangeMatches     200  avgt   10  156.312 ± 0.281  140.852 ± 0.361  118.496 ± 1.082  103.428 ± 0.425  ns/op
> Int.differentSubrangeMatches     300  avgt   10  327.659 ± 0.317  191.959 ± 0.440  148.588 ± 1.106  138.767 ± 0.635  ns/op
> Int.differentSubrangeMatches     400  avgt   10  240.912 ± 0.429  230.264 ± 0.164  179.730 ± 0.306  170.405 ± 0.312  ns/op
> Int.differentSubrangeMatches     500  avgt   10  523.581 ± 0.292  286.112 ± 0.307  210.887 ± 0.311  172.616 ± 0.517  ns/op
> Int.differentSubrangeMatches     600  avgt   10  352.296 ± 0.480  322.362 ± 0.924  249.807 ± 0.290  201.274 ± 0.633  ns/op
> Int.differentSubrangeMatches     700  avgt   10  725.652 ± 0.555  382.037 ± 0.434  278.503 ± 0.633  245.203 ± 0.391  ns/op
> Int.differentSubrangeMatches     800  avgt   10  455.651 ± 1.003  412.241 ± 0.411  312.572 ± 0.475  271.538 ± 0.319  ns/op
> --------------------------------------------------------------------------------------------------------------------------
> Int.matches                      100  avgt   10  143.116 ± 0.627  128.433 ± 0.057  110.221 ± 0.056   95.322 ± 0.049  ns/op
> Int.matches                      200  avgt   10  227.868 ± 0.190  231.481 ± 0.343  172.225 ± 0.052  160.328 ± 0.019  ns/op
> Int.matches                      300  avgt   10  336.983 ± 0.094  301.416 ± 0.279  234.191 ± 0.036  199.844 ± 0.224  ns/op
> Int.matches                      400  avgt   10  440.492 ± 0.503  389.587 ± 0.752  312.521 ± 0.103  259.867 ± 0.030  ns/op
> Int.matches                      500  avgt   10  524.292 ± 0.828  490.197 ± 1.283  362.972 ± 0.847  297.545 ± 0.140  ns/op
> Int.matches                      600  avgt   10  627.717 ± 0.880  577.573 ± 0.764  420.304 ± 0.086  361.774 ± 0.720  ns/op
> Int.matches                      700  avgt   10  730.503 ± 0.281  719.430 ± 0.278  487.603 ± 2.297  397.502 ± 0.467  ns/op
> Int....

@ygaevsky : From the posted JMH numbers, performance regression for smaller sizes (< 64) happens for each case. And there is also a regression for `Int.mismatchStart` for large sizes (>=100). So I don't think that it's acceptable in the current shape. Is it possible to fix that?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-3208655204