RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic [v2]

Yuri Gaevsky duke at openjdk.org
Fri May 16 08:22:04 UTC 2025


On Thu, 24 Apr 2025 17:27:39 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

>> Hello All,
>> 
>> Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported.
>> 
>> Thank you,
>> -Yuri Gaevsky
>> 
>> **Correctness checks:**
>>   hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4.
>
> Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
> 
>  - Merge master
>  - 8324124: RISC-V: implement _vectorizedMismatch intrinsic

More fine-grained data on same hardware as above:

Legend: UseVHI ==> UseVectorizedMismatchIntrinsic
---------------------------------------------------------------------------------------------
                                                    (baseline)              (patch)
---------------------------------------------------------------------------------------------
                                           |-XX:-UseVMI -XX:+UseRVV|-XX:+UseVMI -XX:+UseRVV]
---------------------------------------------------------------------------------------------
Benchmark                      (size)  Mode  Cnt     Score     Error    Score    Error  Units
Byte.differentSubrangeMatches     100  avgt   10   229.256 ±   0.198   69.741 ±  0.275  ns/op
Byte.differentSubrangeMatches     200  avgt   10   201.288 ±   0.190   66.506 ±  0.067  ns/op
Byte.differentSubrangeMatches     300  avgt   10   536.125 ±   0.365  115.255 ±  1.541  ns/op
Byte.differentSubrangeMatches     400  avgt   10   556.112 ±  10.091   74.600 ±  0.364  ns/op
Byte.differentSubrangeMatches     500  avgt   10   870.934 ±   0.696   67.701 ±  0.044  ns/op
Byte.differentSubrangeMatches     600  avgt   10   425.375 ±   0.235  102.945 ±  0.337  ns/op
Byte.differentSubrangeMatches     700  avgt   10  1909.673 ±  53.585  103.073 ±  0.414  ns/op
Byte.differentSubrangeMatches     800  avgt   10   873.724 ±   0.232  102.817 ±  0.068  ns/op
Byte.matches                      100  avgt   10    65.256 ±   0.048   95.762 ± 16.464  ns/op
Byte.matches                      200  avgt   10    88.623 ±   0.149   97.243 ±  1.450  ns/op
Byte.matches                      300  avgt   10   204.775 ±   4.687   92.739 ±  0.047  ns/op
Byte.matches                      400  avgt   10   153.686 ±   0.278  141.477 ± 46.844  ns/op
Byte.matches                      500  avgt   10   310.993 ±   5.080   91.547 ±  0.060  ns/op
Byte.matches                      600  avgt   10   185.652 ±   0.102  124.798 ±  0.066  ns/op
Byte.matches                      700  avgt   10   304.217 ± 124.206  124.744 ±  0.075  ns/op
Byte.matches                      800  avgt   10   239.599 ±   0.110  158.000 ±  0.088  ns/op
Byte.mismatchEnd                  100  avgt   10    60.562 ±   0.358   55.872 ±  0.357  ns/op
Byte.mismatchEnd                  200  avgt   10    84.626 ±   0.038   56.092 ±  0.263  ns/op
Byte.mismatchEnd                  300  avgt   10   114.080 ±   0.071   87.155 ±  0.062  ns/op
Byte.mismatchEnd                  400  avgt   10   149.347 ±   0.254   87.151 ±  0.055  ns/op
Byte.mismatchEnd                  500  avgt   10   180.149 ±   0.426   94.018 ±  0.050  ns/op
Byte.mismatchEnd                  600  avgt   10   182.495 ±   0.103  122.891 ±  0.067  ns/op
Byte.mismatchEnd                  700  avgt   10   222.248 ±   0.480  121.082 ±  0.082  ns/op
Byte.mismatchEnd                  800  avgt   10   235.855 ±   0.338  154.226 ±  0.089  ns/op
Byte.mismatchMid                  100  avgt   10    38.859 ±   0.122   55.244 ±  0.303  ns/op
Byte.mismatchMid                  200  avgt   10    54.352 ±   0.517   55.070 ±  0.236  ns/op
Byte.mismatchMid                  300  avgt   10    68.996 ±   0.057   69.280 ± 26.763  ns/op
Byte.mismatchMid                  400  avgt   10    87.560 ±   0.389   61.302 ±  0.241  ns/op
Byte.mismatchMid                  500  avgt   10   100.948 ±   0.061   55.135 ±  0.259  ns/op
Byte.mismatchMid                  600  avgt   10   118.616 ±   0.130   87.158 ±  0.106  ns/op
Byte.mismatchMid                  700  avgt   10   132.310 ±   0.112   92.138 ±  0.040  ns/op
Byte.mismatchMid                  800  avgt   10   170.413 ±  62.978   90.976 ±  0.060  ns/op
Byte.mismatchStart                100  avgt   10    23.837 ±   0.034   55.365 ±  0.437  ns/op
Byte.mismatchStart                200  avgt   10    23.200 ±   0.012   55.399 ±  0.168  ns/op
Byte.mismatchStart                300  avgt   10    36.600 ±   0.763   54.907 ±  0.235  ns/op
Byte.mismatchStart                400  avgt   10    23.188 ±   0.010   54.653 ±  0.226  ns/op
Byte.mismatchStart                500  avgt   10    36.171 ±   0.325   55.461 ±  0.184  ns/op
Byte.mismatchStart                600  avgt   10    23.829 ±   0.048   55.495 ±  0.448  ns/op
Byte.mismatchStart                700  avgt   10    23.815 ±   0.007   56.140 ±  0.270  ns/op
Byte.mismatchStart                800  avgt   10    25.100 ±   0.034   53.920 ±  0.035  ns/op
---------------------------------------------------------------------------------------------
Short.differentSubrangeMatches     100  avgt   10   282.444 ±  0.153   70.040 ± 0.125  ns/op
Short.differentSubrangeMatches     200  avgt   10   135.741 ±  0.200   68.342 ± 0.057  ns/op
Short.differentSubrangeMatches     300  avgt   10  1277.808 ± 14.456  103.454 ± 0.120  ns/op
Short.differentSubrangeMatches     400  avgt   10   223.451 ±  0.293  171.933 ± 2.561  ns/op
Short.differentSubrangeMatches     500  avgt   10  2138.447 ± 25.749  171.097 ± 2.585  ns/op
Short.differentSubrangeMatches     600  avgt   10   314.485 ±  0.242  139.807 ± 0.079  ns/op
Short.differentSubrangeMatches     700  avgt   10  1754.537 ±  1.708  138.499 ± 0.027  ns/op
Short.differentSubrangeMatches     800  avgt   10   406.632 ±  0.264  171.009 ± 0.708  ns/op
Short.matches                      100  avgt   10    84.960 ±  0.301   60.805 ± 0.017  ns/op
Short.matches                      200  avgt   10   146.112 ±  0.165  100.379 ± 0.159  ns/op
Short.matches                      300  avgt   10   179.960 ±  0.169  129.123 ± 0.070  ns/op
Short.matches                      400  avgt   10   232.550 ±  0.407  162.544 ± 0.050  ns/op
Short.matches                      500  avgt   10   284.438 ±  0.351  169.978 ± 0.114  ns/op
Short.matches                      600  avgt   10   341.725 ±  0.382  198.988 ± 0.136  ns/op
Short.matches                      700  avgt   10   394.701 ±  1.911  397.286 ± 4.721  ns/op
Short.matches                      800  avgt   10   444.682 ±  0.399  260.794 ± 0.155  ns/op
Short.mismatchEnd                  100  avgt   10    78.362 ±  0.026   54.080 ± 0.205  ns/op
Short.mismatchEnd                  200  avgt   10   137.212 ±  0.102   86.555 ± 0.135  ns/op
Short.mismatchEnd                  300  avgt   10   172.287 ±  0.125  120.342 ± 0.079  ns/op
Short.mismatchEnd                  400  avgt   10   223.305 ±  0.159  153.056 ± 0.331  ns/op
Short.mismatchEnd                  500  avgt   10   276.604 ±  0.122  153.631 ± 0.132  ns/op
Short.mismatchEnd                  600  avgt   10   331.104 ±  1.805  186.172 ± 0.057  ns/op
Short.mismatchEnd                  700  avgt   10   384.149 ±  1.776  225.087 ± 0.171  ns/op
Short.mismatchEnd                  800  avgt   10   437.838 ±  0.401  257.156 ± 0.221  ns/op
Short.mismatchMid                  100  avgt   10    51.411 ±  0.009   90.411 ± 0.519  ns/op
Short.mismatchMid                  200  avgt   10    83.834 ±  0.685   90.383 ± 1.926  ns/op
Short.mismatchMid                  300  avgt   10   112.123 ±  0.822   86.532 ± 0.033  ns/op
Short.mismatchMid                  400  avgt   10   141.735 ±  0.667   90.890 ± 0.067  ns/op
Short.mismatchMid                  500  avgt   10   169.345 ±  0.166  144.730 ± 3.164  ns/op
Short.mismatchMid                  600  avgt   10   173.554 ±  0.153  123.545 ± 0.122  ns/op
Short.mismatchMid                  700  avgt   10   194.393 ±  0.262  119.735 ± 0.054  ns/op
Short.mismatchMid                  800  avgt   10   222.163 ±  0.390  158.130 ± 0.087  ns/op
Short.mismatchStart                100  avgt   10    22.566 ±  0.007   54.399 ± 0.293  ns/op
Short.mismatchStart                200  avgt   10    25.699 ±  0.011   54.923 ± 0.127  ns/op
Short.mismatchStart                300  avgt   10    25.702 ±  0.014   54.097 ± 0.254  ns/op
Short.mismatchStart                400  avgt   10    23.822 ±  0.009   99.891 ± 0.788  ns/op
Short.mismatchStart                500  avgt   10    25.721 ±  0.022   54.128 ± 0.266  ns/op
Short.mismatchStart                600  avgt   10    25.704 ±  0.007   53.493 ± 0.203  ns/op
Short.mismatchStart                700  avgt   10    40.538 ±  0.413   54.189 ± 0.184  ns/op
Short.mismatchStart                800  avgt   10    25.706 ±  0.015   57.905 ± 0.243  ns/op
---------------------------------------------------------------------------------------------
Int.differentSubrangeMatches     100  avgt   10  154.976 ±  0.633   69.036 ±   0.070  ns/op
Int.differentSubrangeMatches     200  avgt   10  156.642 ±  0.483  101.957 ±   0.316  ns/op
Int.differentSubrangeMatches     300  avgt   10  394.074 ±  1.545  138.248 ±   0.613  ns/op
Int.differentSubrangeMatches     400  avgt   10  241.063 ±  0.464  170.497 ±   0.551  ns/op
Int.differentSubrangeMatches     500  avgt   10  630.403 ±  0.356  172.495 ±   0.113  ns/op
Int.differentSubrangeMatches     600  avgt   10  356.048 ±  0.451  345.645 ±   5.213  ns/op
Int.differentSubrangeMatches     700  avgt   10  874.046 ±  7.070  239.370 ±   0.333  ns/op
Int.differentSubrangeMatches     800  avgt   10  767.832 ± 14.872  275.932 ±   0.195  ns/op
Int.matches                      100  avgt   10  229.800 ± 67.886   99.062 ±   0.071  ns/op
Int.matches                      200  avgt   10  228.766 ±  0.156  160.609 ±   0.131  ns/op
Int.matches                      300  avgt   10  337.579 ±  0.437  198.166 ±   0.145  ns/op
Int.matches                      400  avgt   10  442.784 ±  2.386  264.596 ±   0.155  ns/op
Int.matches                      500  avgt   10  902.691 ± 10.068  511.427 ±   6.414  ns/op
Int.matches                      600  avgt   10  626.352 ±  0.474  615.184 ±   6.823  ns/op
Int.matches                      700  avgt   10  733.358 ±  0.209  397.746 ±   0.280  ns/op
Int.matches                      800  avgt   10  835.602 ±  0.761  464.160 ±   0.309  ns/op
Int.mismatchEnd                  100  avgt   10  133.112 ±  0.202   86.483 ±   0.034  ns/op
Int.mismatchEnd                  200  avgt   10  351.115 ± 69.363  153.029 ±   0.104  ns/op
Int.mismatchEnd                  300  avgt   10  326.137 ±  0.202  189.315 ±   0.091  ns/op
Int.mismatchEnd                  400  avgt   10  430.870 ±  0.388  346.663 ± 142.707  ns/op
Int.mismatchEnd                  500  avgt   10  512.747 ±  0.476  296.076 ±   0.153  ns/op
Int.mismatchEnd                  600  avgt   10  614.686 ±  0.501  352.447 ±   0.220  ns/op
Int.mismatchEnd                  700  avgt   10  718.262 ±  0.872  390.380 ±   0.385  ns/op
Int.mismatchEnd                  800  avgt   10  823.137 ±  1.092  456.725 ±   0.133  ns/op
Int.mismatchMid                  100  avgt   10   87.525 ±  0.436   53.380 ±   0.099  ns/op
Int.mismatchMid                  200  avgt   10  140.749 ±  0.345   90.916 ±   0.067  ns/op
Int.mismatchMid                  300  avgt   10  173.054 ±  0.186  123.485 ±   0.056  ns/op
Int.mismatchMid                  400  avgt   10  222.213 ±  0.251  152.981 ±   0.104  ns/op
Int.mismatchMid                  500  avgt   10  274.683 ±  0.206  259.329 ±   2.498  ns/op
Int.mismatchMid                  600  avgt   10  325.381 ±  0.138  197.549 ±   0.093  ns/op
Int.mismatchMid                  700  avgt   10  373.725 ±  0.333  223.880 ±   0.116  ns/op
Int.mismatchMid                  800  avgt   10  419.002 ±  0.539  252.679 ±   0.126  ns/op
Int.mismatchStart                100  avgt   10   29.408 ±  0.101   53.512 ±   0.507  ns/op
Int.mismatchStart                200  avgt   10   29.469 ±  2.875   89.489 ±   2.456  ns/op
Int.mismatchStart                300  avgt   10   28.874 ±  0.136   89.833 ±   2.523  ns/op
Int.mismatchStart                400  avgt   10   28.853 ±  0.020   96.711 ±   1.345  ns/op
Int.mismatchStart                500  avgt   10   28.830 ±  0.009   53.488 ±   0.177  ns/op
Int.mismatchStart                600  avgt   10   28.839 ±  0.028   53.415 ±   0.118  ns/op
Int.mismatchStart                700  avgt   10   28.835 ±  0.019   54.031 ±   0.143  ns/op
Int.mismatchStart                800  avgt   10   28.858 ±  0.108   75.214 ±  28.338  ns/op
---------------------------------------------------------------------------------------------

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2885999588


More information about the hotspot-dev mailing list