RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic [v6]
Fei Yang
fyang at openjdk.org
Thu Aug 21 01:50:53 UTC 2025
On Wed, 20 Aug 2025 10:37:36 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:
>> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
>>
>> removed unneeded check for zero length; changed lmul from m8 to m2.
>
> More performance data for m2 vs m4 vs m8:
>
> ==========================================================================================================================
> -XX:-UseRVV -XX:+UseRVV(m2) -XX:+UseRVV(m4) -XX:+UseRVV(m8)
> ==========================================================================================================================
> Benchmark (size) Mode Cnt Score Error Score Error Score Error Score Error Units
> Int.differentSubrangeMatches 100 avgt 10 137.172 ± 0.054 98.497 ± 0.310 79.800 ± 0.279 69.906 ± 0.268 ns/op
> Int.differentSubrangeMatches 200 avgt 10 156.312 ± 0.281 140.852 ± 0.361 118.496 ± 1.082 103.428 ± 0.425 ns/op
> Int.differentSubrangeMatches 300 avgt 10 327.659 ± 0.317 191.959 ± 0.440 148.588 ± 1.106 138.767 ± 0.635 ns/op
> Int.differentSubrangeMatches 400 avgt 10 240.912 ± 0.429 230.264 ± 0.164 179.730 ± 0.306 170.405 ± 0.312 ns/op
> Int.differentSubrangeMatches 500 avgt 10 523.581 ± 0.292 286.112 ± 0.307 210.887 ± 0.311 172.616 ± 0.517 ns/op
> Int.differentSubrangeMatches 600 avgt 10 352.296 ± 0.480 322.362 ± 0.924 249.807 ± 0.290 201.274 ± 0.633 ns/op
> Int.differentSubrangeMatches 700 avgt 10 725.652 ± 0.555 382.037 ± 0.434 278.503 ± 0.633 245.203 ± 0.391 ns/op
> Int.differentSubrangeMatches 800 avgt 10 455.651 ± 1.003 412.241 ± 0.411 312.572 ± 0.475 271.538 ± 0.319 ns/op
> --------------------------------------------------------------------------------------------------------------------------
> Int.matches 100 avgt 10 143.116 ± 0.627 128.433 ± 0.057 110.221 ± 0.056 95.322 ± 0.049 ns/op
> Int.matches 200 avgt 10 227.868 ± 0.190 231.481 ± 0.343 172.225 ± 0.052 160.328 ± 0.019 ns/op
> Int.matches 300 avgt 10 336.983 ± 0.094 301.416 ± 0.279 234.191 ± 0.036 199.844 ± 0.224 ns/op
> Int.matches 400 avgt 10 440.492 ± 0.503 389.587 ± 0.752 312.521 ± 0.103 259.867 ± 0.030 ns/op
> Int.matches 500 avgt 10 524.292 ± 0.828 490.197 ± 1.283 362.972 ± 0.847 297.545 ± 0.140 ns/op
> Int.matches 600 avgt 10 627.717 ± 0.880 577.573 ± 0.764 420.304 ± 0.086 361.774 ± 0.720 ns/op
> Int.matches 700 avgt 10 730.503 ± 0.281 719.430 ± 0.278 487.603 ± 2.297 397.502 ± 0.467 ns/op
> Int....
@ygaevsky : From the posted JMH numbers, performance regression for smaller sizes (< 64) happens for each case. And there is also a regression for `Int.mismatchStart` for large sizes (>=100). So I don't think that it's acceptable in the current shape. Is it possible to fix that?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-3208655204
More information about the hotspot-dev
mailing list