RFR: 8318217: RISC-V: C2 VectorizedHashCode [v10]

Yuri Gaevsky duke at openjdk.org
Wed Dec 6 22:09:40 UTC 2023


On Wed, 6 Dec 2023 21:58:55 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

>> Hello All,
>> 
>> Please review these changes to support _vectorizedHashCode intrinsic on
>> RISC-V platform. The patch adds the "scalar" code for the intrinsic without
>> usage of any RVV instruction but provides manual unrolling of the appropriate
>> loop. The code with usage of RVV instruction could be added as follow-up of
>> the patch or independently.
>> 
>> Thanks,
>> -Yuri Gaevsky
>> 
>> P.S. My OCA has been accepted recently (ygaevsky).
>> 
>> ### Correctness checks
>> 
>> Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux.
>> 
>> ### Performance results (the numbers for non-ints are similar)
>> 
>> #### StarFive JH7110 board:
>> 
>> 
>> ArraysHashCode:              without intrinsic      with intrinsic
>> -------------------------------------------------------------------------------
>> Benchmark  (size)  Mode  Cnt       Score     Error       Score     Error  Units
>> -------------------------------------------------------------------------------
>> multiints       0  avgt   30       2.658 ?   0.001       2.661 ?   0.004  ns/op
>> multiints       1  avgt   30       4.881 ?   0.011       4.892 ?   0.015  ns/op
>> multiints       2  avgt   30      16.109 ?   0.041      10.451 ?   0.075  ns/op
>> multiints       3  avgt   30      14.873 ?   0.068      11.753 ?   0.024  ns/op
>> multiints       4  avgt   30      17.283 ?   0.078      13.176 ?   0.044  ns/op
>> multiints       5  avgt   30      19.691 ?   0.136      14.723 ?   0.046  ns/op
>> multiints       6  avgt   30      21.727 ?   0.166      15.463 ?   0.124  ns/op
>> multiints       7  avgt   30      23.790 ?   0.126      18.298 ?   0.059  ns/op
>> multiints       8  avgt   30      23.527 ?   0.116      18.267 ?   0.046  ns/op
>> multiints       9  avgt   30      27.981 ?   0.303      20.453 ?   0.069  ns/op
>> multiints      10  avgt   30      26.947 ?   0.215      20.541 ?   0.051  ns/op
>> multiints      50  avgt   30      95.373 ?   0.588      69.238 ?   0.208  ns/op
>> multiints     100  avgt   30     177.109 ?   0.525     137.852 ?   0.417  ns/op
>> multiints     200  avgt   30     341.074 ?   1.363     296.832 ?   0.725  ns/op
>> multiints     500  avgt   30     847.993 ?   1.713     752.415 ?   1.918  ns/op
>> multiints    1000  avgt   30    1610.199 ?   5.424    1426.112 ?   3.407  ns/op
>> multiints   10000  avgt   30   16234.260 ?  26.789   14447.936 ?  26.345  ns/op
>> multiints  100000  avgt   30  170726.025 ? 184.003  152587.649 ? 381.964  ns/op
>> ---------------------------------------...
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Added two temp registers for loads; all loads in wide loop has been moved to the start of the loop.

The results of commit 99f91d0 are below.

Sifive Unmatched:

Benchmark                   (size)  Mode  Cnt      Score     Error      Score     Error  Units
ArraysHashCode.bytes            10  avgt   15     65.190 ?   0.954     45.527 ?   1.771  ns/op
ArraysHashCode.bytes           100  avgt   15    321.443 ?   5.586    258.807 ?   4.922  ns/op
ArraysHashCode.bytes          1000  avgt   15   2878.206 ?   9.105   2347.219 ?   8.947  ns/op
ArraysHashCode.bytes         10000  avgt   15  28421.840 ?  35.467  23160.425 ?  30.340  ns/op
ArraysHashCode.chars            10  avgt   15     64.544 ?   1.713     50.808 ?   2.629  ns/op
ArraysHashCode.chars           100  avgt   15    338.919 ?   1.623    265.971 ?   4.874  ns/op
ArraysHashCode.chars          1000  avgt   15   2986.972 ?   4.009   2336.699 ?   2.537  ns/op
ArraysHashCode.chars         10000  avgt   15  29474.441 ?  14.634  23161.582 ?  29.067  ns/op
ArraysHashCode.ints             10  avgt   15     57.104 ?   2.517     46.034 ?   0.602  ns/op
ArraysHashCode.ints            100  avgt   15    330.264 ?   4.543    258.327 ?   1.517  ns/op
ArraysHashCode.ints           1000  avgt   15   2995.208 ?   3.188   2339.664 ?   6.849  ns/op
ArraysHashCode.ints          10000  avgt   15  33855.312 ? 115.319  27836.954 ?  27.304  ns/op
ArraysHashCode.multibytes       10  avgt   15     34.378 ?   0.230     27.076 ?   0.108  ns/op
ArraysHashCode.multibytes      100  avgt   15    193.131 ?   0.370    141.907 ?   0.244  ns/op
ArraysHashCode.multibytes     1000  avgt   15   1651.909 ?   7.812   1377.842 ?  10.299  ns/op
ArraysHashCode.multibytes    10000  avgt   15  16620.685 ?  37.854  13960.556 ?  43.473  ns/op
ArraysHashCode.multichars       10  avgt   15     35.104 ?   0.195     26.308 ?   0.127  ns/op
ArraysHashCode.multichars      100  avgt   15    204.391 ?   0.233    144.662 ?   0.337  ns/op
ArraysHashCode.multichars     1000  avgt   15   1902.088 ?   6.922   1579.549 ?   7.266  ns/op
ArraysHashCode.multichars    10000  avgt   15  18905.923 ?  79.263  15952.155 ?  68.664  ns/op
ArraysHashCode.multiints        10  avgt   15     35.111 ?   0.093     26.551 ?   0.264  ns/op
ArraysHashCode.multiints       100  avgt   15    211.251 ?   0.550    153.683 ?   0.208  ns/op
ArraysHashCode.multiints      1000  avgt   15   2223.176 ?   8.982   1927.689 ?   7.075  ns/op
ArraysHashCode.multiints     10000  avgt   15  31567.767 ? 249.609  29463.762 ? 186.245  ns/op
ArraysHashCode.multishorts      10  avgt   15     35.311 ?   0.313     26.372 ?   0.116  ns/op
ArraysHashCode.multishorts     100  avgt   15    203.294 ?   0.241    144.988 ?   0.494  ns/op
ArraysHashCode.multishorts    1000  avgt   15   1898.485 ?   6.704   1579.381 ?   5.600  ns/op
ArraysHashCode.multishorts   10000  avgt   15  18855.850 ?  66.545  15718.005 ?  75.154  ns/op
ArraysHashCode.shorts           10  avgt   15     56.418 ?   0.186     47.488 ?   2.261  ns/op
ArraysHashCode.shorts          100  avgt   15    337.844 ?   1.202    256.671 ?   0.761  ns/op
ArraysHashCode.shorts         1000  avgt   15   2988.457 ?   6.158   2337.570 ?   2.510  ns/op
ArraysHashCode.shorts        10000  avgt   15  29506.107 ?  41.616  23148.772 ?  40.625  ns/op


T-Head RVB-ICE:

Benchmark                   (size)  Mode  Cnt      Score    Error      Score    Error  Units
ArraysHashCode.bytes            10  avgt   15     53.463 ?  0.274     46.625 ?  0.247  ns/op
ArraysHashCode.bytes           100  avgt   15    280.976 ?  1.478    225.197 ?  1.141  ns/op
ArraysHashCode.bytes          1000  avgt   15   2553.393 ?  4.925   1818.613 ?  3.789  ns/op
ArraysHashCode.bytes         10000  avgt   15  25138.794 ? 39.992  16787.514 ? 59.261  ns/op
ArraysHashCode.chars            10  avgt   15     52.075 ?  0.246     45.924 ?  0.561  ns/op
ArraysHashCode.chars           100  avgt   15    283.441 ?  0.743    237.660 ?  1.074  ns/op
ArraysHashCode.chars          1000  avgt   15   2562.833 ?  3.370   1915.665 ?  4.166  ns/op
ArraysHashCode.chars         10000  avgt   15  25168.219 ? 94.226  18843.917 ? 51.859  ns/op
ArraysHashCode.ints             10  avgt   15     52.126 ?  0.382     46.739 ?  0.366  ns/op
ArraysHashCode.ints            100  avgt   15    283.643 ?  0.901    242.191 ?  0.776  ns/op
ArraysHashCode.ints           1000  avgt   15   2556.508 ?  6.937   1913.271 ?  2.920  ns/op
ArraysHashCode.ints          10000  avgt   15  25171.578 ? 51.725  18835.638 ? 49.785  ns/op
ArraysHashCode.multibytes       10  avgt   15     26.432 ?  0.157     18.762 ?  0.184  ns/op
ArraysHashCode.multibytes      100  avgt   15    160.788 ?  0.484    117.339 ?  0.285  ns/op
ArraysHashCode.multibytes     1000  avgt   15   1366.697 ?  9.217    923.814 ?  4.709  ns/op
ArraysHashCode.multibytes    10000  avgt   15  13360.445 ? 22.830   9350.136 ? 18.251  ns/op
ArraysHashCode.multichars       10  avgt   15     26.732 ?  0.181     19.234 ?  0.136  ns/op
ArraysHashCode.multichars      100  avgt   15    164.043 ?  0.310    117.900 ?  0.386  ns/op
ArraysHashCode.multichars     1000  avgt   15   1398.259 ?  2.765   1030.563 ?  2.701  ns/op
ArraysHashCode.multichars    10000  avgt   15  13331.460 ? 21.356   9749.817 ? 23.566  ns/op
ArraysHashCode.multiints        10  avgt   15     25.972 ?  0.135     18.745 ?  0.155  ns/op
ArraysHashCode.multiints       100  avgt   15    169.487 ?  0.357    125.620 ?  0.330  ns/op
ArraysHashCode.multiints      1000  avgt   15   1399.977 ?  9.000   1036.132 ?  3.237  ns/op
ArraysHashCode.multiints     10000  avgt   15  13760.907 ? 23.137  10324.485 ? 18.437  ns/op
ArraysHashCode.multishorts      10  avgt   15     26.541 ?  0.223     19.389 ?  0.151  ns/op
ArraysHashCode.multishorts     100  avgt   15    163.990 ?  0.301    117.797 ?  0.419  ns/op
ArraysHashCode.multishorts    1000  avgt   15   1402.545 ?  3.285   1031.649 ?  7.023  ns/op
ArraysHashCode.multishorts   10000  avgt   15  13349.611 ? 25.599   9778.011 ? 19.135  ns/op
ArraysHashCode.shorts           10  avgt   15     52.037 ?  0.265     46.881 ?  0.636  ns/op
ArraysHashCode.shorts          100  avgt   15    285.775 ?  0.702    244.200 ?  1.012  ns/op
ArraysHashCode.shorts         1000  avgt   15   2553.894 ?  5.309   1926.098 ?  3.496  ns/op
ArraysHashCode.shorts        10000  avgt   15  25201.063 ? 95.129  18843.485 ? 73.870  ns/op


StarFive JH7110

Benchmark                   (size)  Mode  Cnt      Score    Error      Score    Error  Units
ArraysHashCode.bytes            10  avgt   15     41.093 ?  0.541     34.051 ?  0.032  ns/op
ArraysHashCode.bytes           100  avgt   15    250.250 ?  0.846    201.460 ?  0.631  ns/op
ArraysHashCode.bytes          1000  avgt   15   2283.792 ?  0.293   1855.048 ?  0.337  ns/op
ArraysHashCode.bytes         10000  avgt   15  22613.649 ? 85.647  18454.512 ? 93.310  ns/op
ArraysHashCode.chars            10  avgt   15     45.441 ?  0.108     34.747 ?  0.008  ns/op
ArraysHashCode.chars           100  avgt   15    261.762 ?  1.081    203.169 ?  0.118  ns/op
ArraysHashCode.chars          1000  avgt   15   2372.976 ?  1.541   1856.964 ?  4.764  ns/op
ArraysHashCode.chars         10000  avgt   15  23429.722 ?  6.530  18390.679 ?  2.956  ns/op
ArraysHashCode.ints             10  avgt   15     45.530 ?  0.284     34.744 ?  0.005  ns/op
ArraysHashCode.ints            100  avgt   15    261.117 ?  0.721    203.332 ?  0.218  ns/op
ArraysHashCode.ints           1000  avgt   15   2373.573 ?  3.175   1856.836 ?  0.223  ns/op
ArraysHashCode.ints          10000  avgt   15  29624.472 ? 44.767  24626.598 ? 54.767  ns/op
ArraysHashCode.multibytes       10  avgt   15     26.975 ?  0.259     19.854 ?  0.114  ns/op
ArraysHashCode.multibytes      100  avgt   15    156.220 ?  0.247    113.744 ?  0.366  ns/op
ArraysHashCode.multibytes     1000  avgt   15   1296.236 ?  7.541   1073.224 ?  4.383  ns/op
ArraysHashCode.multibytes    10000  avgt   15  12779.460 ?  2.007  10593.835 ?  5.531  ns/op
ArraysHashCode.multichars       10  avgt   15     27.520 ?  0.102     19.992 ?  0.054  ns/op
ArraysHashCode.multichars      100  avgt   15    166.026 ?  0.695    117.982 ?  0.639  ns/op
ArraysHashCode.multichars     1000  avgt   15   1430.447 ?  1.517   1165.180 ?  5.783  ns/op
ArraysHashCode.multichars    10000  avgt   15  14134.839 ?  6.270  11499.764 ? 37.546  ns/op
ArraysHashCode.multiints        10  avgt   15     26.872 ?  0.066     20.127 ?  0.083  ns/op
ArraysHashCode.multiints       100  avgt   15    178.919 ?  0.245    132.377 ?  0.484  ns/op
ArraysHashCode.multiints      1000  avgt   15   1607.719 ?  2.903   1339.118 ?  8.704  ns/op
ArraysHashCode.multiints     10000  avgt   15  16390.804 ? 49.820  13706.741 ? 11.994  ns/op
ArraysHashCode.multishorts      10  avgt   15     27.749 ?  0.165     20.011 ?  0.096  ns/op
ArraysHashCode.multishorts     100  avgt   15    166.625 ?  0.592    119.115 ?  0.324  ns/op
ArraysHashCode.multishorts    1000  avgt   15   1429.682 ?  1.607   1165.839 ?  6.013  ns/op
ArraysHashCode.multishorts   10000  avgt   15  14199.682 ?  6.483  11493.880 ?  6.484  ns/op
ArraysHashCode.shorts           10  avgt   15     45.878 ?  0.348     34.768 ?  0.116  ns/op
ArraysHashCode.shorts          100  avgt   15    260.598 ?  0.079    203.937 ?  0.078  ns/op
ArraysHashCode.shorts         1000  avgt   15   2374.712 ?  7.961   1857.542 ?  0.248  ns/op
ArraysHashCode.shorts        10000  avgt   15  23428.899 ?  4.859  18433.195 ? 34.224  ns/op

The improvements on SiFive/StarFive came after the move of  all memory loads up to the start of the loop. Differences between Out-of-Order versus In-Order CPUs?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16629#issuecomment-1843766166


More information about the hotspot-dev mailing list