RFR: 8318217: RISC-V: C2 VectorizedHashCode [v12]

Fei Yang fyang at openjdk.org
Mon Dec 11 07:29:22 UTC 2023


On Sun, 10 Dec 2023 12:14:35 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

>> Hello All,
>> 
>> Please review these changes to support _vectorizedHashCode intrinsic on
>> RISC-V platform. The patch adds the "scalar" code for the intrinsic without
>> usage of any RVV instruction but provides manual unrolling of the appropriate
>> loop. The code with usage of RVV instruction could be added as follow-up of
>> the patch or independently.
>> 
>> Thanks,
>> -Yuri Gaevsky
>> 
>> P.S. My OCA has been accepted recently (ygaevsky).
>> 
>> ### Correctness checks
>> 
>> Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux.
>> 
>> ### Performance results (the numbers for non-ints are similar)
>> 
>> #### StarFive JH7110 board:
>> 
>> 
>> ArraysHashCode:              without intrinsic      with intrinsic
>> -------------------------------------------------------------------------------
>> Benchmark  (size)  Mode  Cnt       Score     Error       Score     Error  Units
>> -------------------------------------------------------------------------------
>> multiints       0  avgt   30       2.658 ?   0.001       2.661 ?   0.004  ns/op
>> multiints       1  avgt   30       4.881 ?   0.011       4.892 ?   0.015  ns/op
>> multiints       2  avgt   30      16.109 ?   0.041      10.451 ?   0.075  ns/op
>> multiints       3  avgt   30      14.873 ?   0.068      11.753 ?   0.024  ns/op
>> multiints       4  avgt   30      17.283 ?   0.078      13.176 ?   0.044  ns/op
>> multiints       5  avgt   30      19.691 ?   0.136      14.723 ?   0.046  ns/op
>> multiints       6  avgt   30      21.727 ?   0.166      15.463 ?   0.124  ns/op
>> multiints       7  avgt   30      23.790 ?   0.126      18.298 ?   0.059  ns/op
>> multiints       8  avgt   30      23.527 ?   0.116      18.267 ?   0.046  ns/op
>> multiints       9  avgt   30      27.981 ?   0.303      20.453 ?   0.069  ns/op
>> multiints      10  avgt   30      26.947 ?   0.215      20.541 ?   0.051  ns/op
>> multiints      50  avgt   30      95.373 ?   0.588      69.238 ?   0.208  ns/op
>> multiints     100  avgt   30     177.109 ?   0.525     137.852 ?   0.417  ns/op
>> multiints     200  avgt   30     341.074 ?   1.363     296.832 ?   0.725  ns/op
>> multiints     500  avgt   30     847.993 ?   1.713     752.415 ?   1.918  ns/op
>> multiints    1000  avgt   30    1610.199 ?   5.424    1426.112 ?   3.407  ns/op
>> multiints   10000  avgt   30   16234.260 ?  26.789   14447.936 ?  26.345  ns/op
>> multiints  100000  avgt   30  170726.025 ? 184.003  152587.649 ? 381.964  ns/op
>> ---------------------------------------...
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Moved zero check for cnt before TAIL per @RealFYang suggestion.

Thanks for the update. So I gave it a second try and some tunning. I see up to 7%+ extra improvement on licheepi-4a board (T-Head C910) with following small add-on change (no obvious change on unmatched board). This materializes the powers of 31 with direct `mv` instructions and avoids loading elements from `_arrays_hashcode_powers_of_31` array which would involve calculation of the array address. We could further remove the `_arrays_hashcode_powers_of_31` array then.


diff --git a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
index 11cbcaa48a1..fe82b7a4e74 100644
--- a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
@@ -1493,16 +1493,16 @@ void C2_MacroAssembler::arrays_hashcode(Register ary, Register cnt, Register res

   beqz(cnt, DONE);

-  addiw(pow31_2, zr, 961);       // [31^^2]
   andi(chunks, cnt, ~(stride-1));
   beqz(chunks, TAIL);

+  mv(pow31_4, 923521); // [31^^4]
+  mv(pow31_3, 29791);  // [31^^3]
+  mv(pow31_2, 961);    // [31^^2]
+
   slli(chunks_end, chunks, chunks_end_shift);
   add(chunks_end, ary, chunks_end);
   andi(cnt, cnt, stride-1);      // don't forget about tail!
-  ld(pow31_4, ExternalAddress(StubRoutines::riscv::arrays_hashcode_powers_of_31()
-                              + 0 * sizeof(jint))); // [31^^3:31^^4]
-  srli(pow31_3, pow31_4, 32);

   bind(WIDE_LOOP);
   mulw(result, result, pow31_4); // 31^^4 * h


1. licheepi-4a / without addon fix:

Benchmark                   (size)  Mode  Cnt      Score    Error  Units
ArraysHashCode.bytes             1  avgt   15     21.327 ?  0.035  ns/op
ArraysHashCode.bytes            10  avgt   15     33.195 ?  0.166  ns/op
ArraysHashCode.bytes           100  avgt   15    154.175 ?  3.433  ns/op
ArraysHashCode.bytes         10000  avgt   15  12318.680 ? 25.131  ns/op
ArraysHashCode.chars             1  avgt   15     20.965 ?  0.598  ns/op
ArraysHashCode.chars            10  avgt   15     33.097 ?  0.117  ns/op
ArraysHashCode.chars           100  avgt   15    153.510 ?  0.280  ns/op
ArraysHashCode.chars         10000  avgt   15  11881.690 ? 44.507  ns/op
ArraysHashCode.ints              1  avgt   15     21.330 ?  0.070  ns/op
ArraysHashCode.ints             10  avgt   15     33.409 ?  0.225  ns/op
ArraysHashCode.ints            100  avgt   15    154.254 ?  0.650  ns/op
ArraysHashCode.ints          10000  avgt   15  11833.894 ? 73.945  ns/op
ArraysHashCode.multibytes        1  avgt   15      3.468 ?  0.046  ns/op
ArraysHashCode.multibytes       10  avgt   15     12.412 ?  0.126  ns/op
ArraysHashCode.multibytes      100  avgt   15     75.963 ?  0.267  ns/op
ArraysHashCode.multibytes    10000  avgt   15   6587.068 ? 53.064  ns/op
ArraysHashCode.multichars        1  avgt   15      3.437 ?  0.042  ns/op
ArraysHashCode.multichars       10  avgt   15     13.019 ?  0.118  ns/op
ArraysHashCode.multichars      100  avgt   15     82.657 ?  0.244  ns/op
ArraysHashCode.multichars    10000  avgt   15   6743.844 ? 80.474  ns/op
ArraysHashCode.multiints         1  avgt   15      3.409 ?  0.036  ns/op
ArraysHashCode.multiints        10  avgt   15     13.102 ?  0.140  ns/op
ArraysHashCode.multiints       100  avgt   15     82.864 ?  1.002  ns/op
ArraysHashCode.multiints     10000  avgt   15   7107.843 ? 69.506  ns/op
ArraysHashCode.multishorts       1  avgt   15      3.475 ?  0.033  ns/op
ArraysHashCode.multishorts      10  avgt   15     12.923 ?  0.108  ns/op
ArraysHashCode.multishorts     100  avgt   15     82.498 ?  0.450  ns/op
ArraysHashCode.multishorts   10000  avgt   15   6744.477 ? 22.576  ns/op
ArraysHashCode.shorts            1  avgt   15     21.337 ?  0.077  ns/op
ArraysHashCode.shorts           10  avgt   15     33.236 ?  0.114  ns/op
ArraysHashCode.shorts          100  avgt   15    154.099 ?  0.421  ns/op
ArraysHashCode.shorts        10000  avgt   15  11876.918 ? 41.767  ns/op


2. licheepi-4a / with add-on change:

Benchmark                   (size)  Mode  Cnt      Score    Error  Units
ArraysHashCode.bytes             1  avgt   15     21.311 ?  0.036  ns/op
ArraysHashCode.bytes            10  avgt   15     32.113 ?  0.124  ns/op
ArraysHashCode.bytes           100  avgt   15    150.476 ?  0.635  ns/op
ArraysHashCode.bytes         10000  avgt   15  11639.521 ? 16.383  ns/op
ArraysHashCode.chars             1  avgt   15     21.329 ?  0.041  ns/op
ArraysHashCode.chars            10  avgt   15     32.315 ?  0.466  ns/op
ArraysHashCode.chars           100  avgt   15    151.996 ?  1.008  ns/op
ArraysHashCode.chars         10000  avgt   15  10957.449 ? 23.898  ns/op
ArraysHashCode.ints              1  avgt   15     21.323 ?  0.035  ns/op
ArraysHashCode.ints             10  avgt   15     32.416 ?  0.170  ns/op
ArraysHashCode.ints            100  avgt   15    152.277 ?  0.555  ns/op
ArraysHashCode.ints          10000  avgt   15  11019.286 ? 53.589  ns/op
ArraysHashCode.multibytes        1  avgt   15      3.450 ?  0.026  ns/op
ArraysHashCode.multibytes       10  avgt   15     12.204 ?  0.171  ns/op
ArraysHashCode.multibytes      100  avgt   15     78.433 ?  0.357  ns/op
ArraysHashCode.multibytes    10000  avgt   15   6654.488 ? 19.664  ns/op
ArraysHashCode.multichars        1  avgt   15      3.443 ?  0.043  ns/op
ArraysHashCode.multichars       10  avgt   15     12.364 ?  0.087  ns/op
ArraysHashCode.multichars      100  avgt   15     78.246 ?  0.540  ns/op
ArraysHashCode.multichars    10000  avgt   15   6455.363 ? 30.115  ns/op
ArraysHashCode.multiints         1  avgt   15      3.441 ?  0.019  ns/op
ArraysHashCode.multiints        10  avgt   15     12.493 ?  0.063  ns/op
ArraysHashCode.multiints       100  avgt   15     78.485 ?  0.587  ns/op
ArraysHashCode.multiints     10000  avgt   15   6843.608 ? 82.197  ns/op
ArraysHashCode.multishorts       1  avgt   15      3.466 ?  0.029  ns/op
ArraysHashCode.multishorts      10  avgt   15     12.369 ?  0.144  ns/op
ArraysHashCode.multishorts     100  avgt   15     78.172 ?  0.580  ns/op
ArraysHashCode.multishorts   10000  avgt   15   6446.791 ? 13.104  ns/op
ArraysHashCode.shorts            1  avgt   15     20.971 ?  0.574  ns/op
ArraysHashCode.shorts           10  avgt   15     32.002 ?  0.642  ns/op
ArraysHashCode.shorts          100  avgt   15    152.359 ?  0.692  ns/op
ArraysHashCode.shorts        10000  avgt   15  10968.816 ? 31.404  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16629#issuecomment-1849459695


More information about the hotspot-dev mailing list