RFR: 8318217: RISC-V: C2 VectorizedHashCode [v12]
Fei Yang
fyang at openjdk.org
Mon Dec 11 07:29:22 UTC 2023
On Sun, 10 Dec 2023 12:14:35 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:
>> Hello All,
>>
>> Please review these changes to support _vectorizedHashCode intrinsic on
>> RISC-V platform. The patch adds the "scalar" code for the intrinsic without
>> usage of any RVV instruction but provides manual unrolling of the appropriate
>> loop. The code with usage of RVV instruction could be added as follow-up of
>> the patch or independently.
>>
>> Thanks,
>> -Yuri Gaevsky
>>
>> P.S. My OCA has been accepted recently (ygaevsky).
>>
>> ### Correctness checks
>>
>> Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux.
>>
>> ### Performance results (the numbers for non-ints are similar)
>>
>> #### StarFive JH7110 board:
>>
>>
>> ArraysHashCode: without intrinsic with intrinsic
>> -------------------------------------------------------------------------------
>> Benchmark (size) Mode Cnt Score Error Score Error Units
>> -------------------------------------------------------------------------------
>> multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op
>> multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op
>> multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op
>> multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op
>> multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op
>> multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op
>> multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op
>> multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op
>> multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op
>> multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op
>> multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op
>> multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op
>> multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op
>> multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op
>> multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op
>> multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op
>> multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op
>> multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op
>> ---------------------------------------...
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
>
> Moved zero check for cnt before TAIL per @RealFYang suggestion.
Thanks for the update. So I gave it a second try and some tunning. I see up to 7%+ extra improvement on licheepi-4a board (T-Head C910) with following small add-on change (no obvious change on unmatched board). This materializes the powers of 31 with direct `mv` instructions and avoids loading elements from `_arrays_hashcode_powers_of_31` array which would involve calculation of the array address. We could further remove the `_arrays_hashcode_powers_of_31` array then.
diff --git a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
index 11cbcaa48a1..fe82b7a4e74 100644
--- a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
@@ -1493,16 +1493,16 @@ void C2_MacroAssembler::arrays_hashcode(Register ary, Register cnt, Register res
beqz(cnt, DONE);
- addiw(pow31_2, zr, 961); // [31^^2]
andi(chunks, cnt, ~(stride-1));
beqz(chunks, TAIL);
+ mv(pow31_4, 923521); // [31^^4]
+ mv(pow31_3, 29791); // [31^^3]
+ mv(pow31_2, 961); // [31^^2]
+
slli(chunks_end, chunks, chunks_end_shift);
add(chunks_end, ary, chunks_end);
andi(cnt, cnt, stride-1); // don't forget about tail!
- ld(pow31_4, ExternalAddress(StubRoutines::riscv::arrays_hashcode_powers_of_31()
- + 0 * sizeof(jint))); // [31^^3:31^^4]
- srli(pow31_3, pow31_4, 32);
bind(WIDE_LOOP);
mulw(result, result, pow31_4); // 31^^4 * h
1. licheepi-4a / without addon fix:
Benchmark (size) Mode Cnt Score Error Units
ArraysHashCode.bytes 1 avgt 15 21.327 ? 0.035 ns/op
ArraysHashCode.bytes 10 avgt 15 33.195 ? 0.166 ns/op
ArraysHashCode.bytes 100 avgt 15 154.175 ? 3.433 ns/op
ArraysHashCode.bytes 10000 avgt 15 12318.680 ? 25.131 ns/op
ArraysHashCode.chars 1 avgt 15 20.965 ? 0.598 ns/op
ArraysHashCode.chars 10 avgt 15 33.097 ? 0.117 ns/op
ArraysHashCode.chars 100 avgt 15 153.510 ? 0.280 ns/op
ArraysHashCode.chars 10000 avgt 15 11881.690 ? 44.507 ns/op
ArraysHashCode.ints 1 avgt 15 21.330 ? 0.070 ns/op
ArraysHashCode.ints 10 avgt 15 33.409 ? 0.225 ns/op
ArraysHashCode.ints 100 avgt 15 154.254 ? 0.650 ns/op
ArraysHashCode.ints 10000 avgt 15 11833.894 ? 73.945 ns/op
ArraysHashCode.multibytes 1 avgt 15 3.468 ? 0.046 ns/op
ArraysHashCode.multibytes 10 avgt 15 12.412 ? 0.126 ns/op
ArraysHashCode.multibytes 100 avgt 15 75.963 ? 0.267 ns/op
ArraysHashCode.multibytes 10000 avgt 15 6587.068 ? 53.064 ns/op
ArraysHashCode.multichars 1 avgt 15 3.437 ? 0.042 ns/op
ArraysHashCode.multichars 10 avgt 15 13.019 ? 0.118 ns/op
ArraysHashCode.multichars 100 avgt 15 82.657 ? 0.244 ns/op
ArraysHashCode.multichars 10000 avgt 15 6743.844 ? 80.474 ns/op
ArraysHashCode.multiints 1 avgt 15 3.409 ? 0.036 ns/op
ArraysHashCode.multiints 10 avgt 15 13.102 ? 0.140 ns/op
ArraysHashCode.multiints 100 avgt 15 82.864 ? 1.002 ns/op
ArraysHashCode.multiints 10000 avgt 15 7107.843 ? 69.506 ns/op
ArraysHashCode.multishorts 1 avgt 15 3.475 ? 0.033 ns/op
ArraysHashCode.multishorts 10 avgt 15 12.923 ? 0.108 ns/op
ArraysHashCode.multishorts 100 avgt 15 82.498 ? 0.450 ns/op
ArraysHashCode.multishorts 10000 avgt 15 6744.477 ? 22.576 ns/op
ArraysHashCode.shorts 1 avgt 15 21.337 ? 0.077 ns/op
ArraysHashCode.shorts 10 avgt 15 33.236 ? 0.114 ns/op
ArraysHashCode.shorts 100 avgt 15 154.099 ? 0.421 ns/op
ArraysHashCode.shorts 10000 avgt 15 11876.918 ? 41.767 ns/op
2. licheepi-4a / with add-on change:
Benchmark (size) Mode Cnt Score Error Units
ArraysHashCode.bytes 1 avgt 15 21.311 ? 0.036 ns/op
ArraysHashCode.bytes 10 avgt 15 32.113 ? 0.124 ns/op
ArraysHashCode.bytes 100 avgt 15 150.476 ? 0.635 ns/op
ArraysHashCode.bytes 10000 avgt 15 11639.521 ? 16.383 ns/op
ArraysHashCode.chars 1 avgt 15 21.329 ? 0.041 ns/op
ArraysHashCode.chars 10 avgt 15 32.315 ? 0.466 ns/op
ArraysHashCode.chars 100 avgt 15 151.996 ? 1.008 ns/op
ArraysHashCode.chars 10000 avgt 15 10957.449 ? 23.898 ns/op
ArraysHashCode.ints 1 avgt 15 21.323 ? 0.035 ns/op
ArraysHashCode.ints 10 avgt 15 32.416 ? 0.170 ns/op
ArraysHashCode.ints 100 avgt 15 152.277 ? 0.555 ns/op
ArraysHashCode.ints 10000 avgt 15 11019.286 ? 53.589 ns/op
ArraysHashCode.multibytes 1 avgt 15 3.450 ? 0.026 ns/op
ArraysHashCode.multibytes 10 avgt 15 12.204 ? 0.171 ns/op
ArraysHashCode.multibytes 100 avgt 15 78.433 ? 0.357 ns/op
ArraysHashCode.multibytes 10000 avgt 15 6654.488 ? 19.664 ns/op
ArraysHashCode.multichars 1 avgt 15 3.443 ? 0.043 ns/op
ArraysHashCode.multichars 10 avgt 15 12.364 ? 0.087 ns/op
ArraysHashCode.multichars 100 avgt 15 78.246 ? 0.540 ns/op
ArraysHashCode.multichars 10000 avgt 15 6455.363 ? 30.115 ns/op
ArraysHashCode.multiints 1 avgt 15 3.441 ? 0.019 ns/op
ArraysHashCode.multiints 10 avgt 15 12.493 ? 0.063 ns/op
ArraysHashCode.multiints 100 avgt 15 78.485 ? 0.587 ns/op
ArraysHashCode.multiints 10000 avgt 15 6843.608 ? 82.197 ns/op
ArraysHashCode.multishorts 1 avgt 15 3.466 ? 0.029 ns/op
ArraysHashCode.multishorts 10 avgt 15 12.369 ? 0.144 ns/op
ArraysHashCode.multishorts 100 avgt 15 78.172 ? 0.580 ns/op
ArraysHashCode.multishorts 10000 avgt 15 6446.791 ? 13.104 ns/op
ArraysHashCode.shorts 1 avgt 15 20.971 ? 0.574 ns/op
ArraysHashCode.shorts 10 avgt 15 32.002 ? 0.642 ns/op
ArraysHashCode.shorts 100 avgt 15 152.359 ? 0.692 ns/op
ArraysHashCode.shorts 10000 avgt 15 10968.816 ? 31.404 ns/op
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16629#issuecomment-1849459695
More information about the hotspot-dev
mailing list