RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v10]
Yuri Gaevsky
duke at openjdk.org
Fri Jul 18 11:13:52 UTC 2025
On Thu, 17 Jul 2025 12:41:47 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:
> Looking at the JMH numbers, it's interesting to find that `-XX:DisableIntrinsic=_vectorizedHashCode` outperforms `-XX:-UseRVV`. If that is the case, then why would we want the scalar version (that is `C2_MacroAssembler::arrays_hashcode()`)?
I've just found that the following change:
$ git diff
diff --git a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
index c62997310b3..f98b48adccd 100644
--- a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
@@ -1953,16 +1953,15 @@ void C2_MacroAssembler::arrays_hashcode(Register ary, Register cnt, Register res
mv(pow31_3, 29791); // [31^^3]
mv(pow31_2, 961); // [31^^2]
- slli(chunks_end, chunks, chunks_end_shift);
- add(chunks_end, ary, chunks_end);
+ shadd(chunks_end, chunks, ary, t0, chunks_end_shift);
andi(cnt, cnt, stride - 1); // don't forget about tail!
bind(WIDE_LOOP);
- mulw(result, result, pow31_4); // 31^^4 * h
arrays_hashcode_elload(t0, Address(ary, 0 * elsize), eltype);
arrays_hashcode_elload(t1, Address(ary, 1 * elsize), eltype);
arrays_hashcode_elload(tmp5, Address(ary, 2 * elsize), eltype);
arrays_hashcode_elload(tmp6, Address(ary, 3 * elsize), eltype);
+ mulw(result, result, pow31_4); // 31^^4 * h
mulw(t0, t0, pow31_3); // 31^^3 * ary[i+0]
addw(result, result, t0);
mulw(t1, t1, pow31_2); // 31^^2 * ary[i+1]
@@ -1977,8 +1976,7 @@ void C2_MacroAssembler::arrays_hashcode(Register ary, Register cnt, Register res
beqz(cnt, DONE);
bind(TAIL);
- slli(chunks_end, cnt, chunks_end_shift);
- add(chunks_end, ary, chunks_end);
+ shadd(chunks_end, cnt, ary, t0, chunks_end_shift);
bind(TAIL_LOOP);
arrays_hashcode_elload(t0, Address(ary), eltype);
makes the numbers good again at BPI-F3 as well (mostly due to move `mulw` down in the loop):
--- -XX:DisableIntrinsic=_vectorizedHashCode ---
Benchmark (size) Mode Cnt Score Error Units
ArraysHashCode.ints 1 avgt 10 11.271 ± 0.003 ns/op
ArraysHashCode.ints 5 avgt 10 28.910 ± 0.036 ns/op
ArraysHashCode.ints 10 avgt 10 41.176 ± 0.383 ns/op
ArraysHashCode.ints 20 avgt 10 68.236 ± 0.087 ns/op
ArraysHashCode.ints 30 avgt 10 88.215 ± 0.272 ns/op
ArraysHashCode.ints 40 avgt 10 115.218 ± 0.065 ns/op
ArraysHashCode.ints 50 avgt 10 135.834 ± 0.374 ns/op
ArraysHashCode.ints 60 avgt 10 162.042 ± 0.488 ns/op
ArraysHashCode.ints 70 avgt 10 170.784 ± 0.538 ns/op
ArraysHashCode.ints 80 avgt 10 194.294 ± 0.407 ns/op
ArraysHashCode.ints 90 avgt 10 208.811 ± 0.289 ns/op
ArraysHashCode.ints 100 avgt 10 231.826 ± 0.471 ns/op
ArraysHashCode.ints 200 avgt 10 446.403 ± 0.491 ns/op
ArraysHashCode.ints 300 avgt 10 655.815 ± 0.603 ns/op
--- -XX:-UseRVV ---
Benchmark (size) Mode Cnt Score Error Units
ArraysHashCode.ints 1 avgt 10 11.281 ± 0.004 ns/op
ArraysHashCode.ints 5 avgt 10 23.178 ± 0.011 ns/op
ArraysHashCode.ints 10 avgt 10 33.183 ± 0.018 ns/op
ArraysHashCode.ints 20 avgt 10 50.778 ± 0.027 ns/op
ArraysHashCode.ints 30 avgt 10 70.892 ± 0.153 ns/op
ArraysHashCode.ints 40 avgt 10 88.292 ± 0.018 ns/op
ArraysHashCode.ints 50 avgt 10 108.978 ± 0.269 ns/op
ArraysHashCode.ints 60 avgt 10 126.010 ± 0.064 ns/op
ArraysHashCode.ints 70 avgt 10 146.115 ± 0.252 ns/op
ArraysHashCode.ints 80 avgt 10 163.453 ± 0.078 ns/op
ArraysHashCode.ints 90 avgt 10 184.433 ± 0.256 ns/op
ArraysHashCode.ints 100 avgt 10 201.002 ± 0.036 ns/op
ArraysHashCode.ints 200 avgt 10 388.929 ± 0.254 ns/op
ArraysHashCode.ints 300 avgt 10 577.083 ± 0.325 ns/op
And it's still good on other hardware mentioned earlier.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3088680935
More information about the hotspot-compiler-dev
mailing list