RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v10]

Yuri Gaevsky duke at openjdk.org
Fri Jul 18 11:13:52 UTC 2025


On Thu, 17 Jul 2025 12:41:47 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

> Looking at the JMH numbers, it's interesting to find that `-XX:DisableIntrinsic=_vectorizedHashCode` outperforms `-XX:-UseRVV`. If that is the case, then why would we want the scalar version (that is `C2_MacroAssembler::arrays_hashcode()`)?

I've just found that the following change:

$ git diff
diff --git a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
index c62997310b3..f98b48adccd 100644
--- a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
@@ -1953,16 +1953,15 @@ void C2_MacroAssembler::arrays_hashcode(Register ary, Register cnt, Register res
   mv(pow31_3,  29791);           // [31^^3]
   mv(pow31_2,    961);           // [31^^2]
 
-  slli(chunks_end, chunks, chunks_end_shift);
-  add(chunks_end, ary, chunks_end);
+  shadd(chunks_end, chunks, ary, t0, chunks_end_shift);
   andi(cnt, cnt, stride - 1);    // don't forget about tail!
 
   bind(WIDE_LOOP);
-  mulw(result, result, pow31_4); // 31^^4 * h
   arrays_hashcode_elload(t0,   Address(ary, 0 * elsize), eltype);
   arrays_hashcode_elload(t1,   Address(ary, 1 * elsize), eltype);
   arrays_hashcode_elload(tmp5, Address(ary, 2 * elsize), eltype);
   arrays_hashcode_elload(tmp6, Address(ary, 3 * elsize), eltype);
+  mulw(result, result, pow31_4); // 31^^4 * h
   mulw(t0, t0, pow31_3);         // 31^^3 * ary[i+0]
   addw(result, result, t0);
   mulw(t1, t1, pow31_2);         // 31^^2 * ary[i+1]
@@ -1977,8 +1976,7 @@ void C2_MacroAssembler::arrays_hashcode(Register ary, Register cnt, Register res
   beqz(cnt, DONE);
 
   bind(TAIL);
-  slli(chunks_end, cnt, chunks_end_shift);
-  add(chunks_end, ary, chunks_end);
+  shadd(chunks_end, cnt, ary, t0, chunks_end_shift);
 
   bind(TAIL_LOOP);
   arrays_hashcode_elload(t0, Address(ary), eltype);

makes the numbers good again at BPI-F3 as well (mostly due to move `mulw` down in the loop):

--- -XX:DisableIntrinsic=_vectorizedHashCode ---
Benchmark            (size)  Mode  Cnt    Score   Error  Units
ArraysHashCode.ints       1  avgt   10   11.271 ± 0.003  ns/op
ArraysHashCode.ints       5  avgt   10   28.910 ± 0.036  ns/op
ArraysHashCode.ints      10  avgt   10   41.176 ± 0.383  ns/op
ArraysHashCode.ints      20  avgt   10   68.236 ± 0.087  ns/op
ArraysHashCode.ints      30  avgt   10   88.215 ± 0.272  ns/op
ArraysHashCode.ints      40  avgt   10  115.218 ± 0.065  ns/op
ArraysHashCode.ints      50  avgt   10  135.834 ± 0.374  ns/op
ArraysHashCode.ints      60  avgt   10  162.042 ± 0.488  ns/op
ArraysHashCode.ints      70  avgt   10  170.784 ± 0.538  ns/op
ArraysHashCode.ints      80  avgt   10  194.294 ± 0.407  ns/op
ArraysHashCode.ints      90  avgt   10  208.811 ± 0.289  ns/op
ArraysHashCode.ints     100  avgt   10  231.826 ± 0.471  ns/op
ArraysHashCode.ints     200  avgt   10  446.403 ± 0.491  ns/op
ArraysHashCode.ints     300  avgt   10  655.815 ± 0.603  ns/op
--- -XX:-UseRVV ---
Benchmark            (size)  Mode  Cnt    Score   Error  Units
ArraysHashCode.ints       1  avgt   10   11.281 ± 0.004  ns/op
ArraysHashCode.ints       5  avgt   10   23.178 ± 0.011  ns/op
ArraysHashCode.ints      10  avgt   10   33.183 ± 0.018  ns/op
ArraysHashCode.ints      20  avgt   10   50.778 ± 0.027  ns/op
ArraysHashCode.ints      30  avgt   10   70.892 ± 0.153  ns/op
ArraysHashCode.ints      40  avgt   10   88.292 ± 0.018  ns/op
ArraysHashCode.ints      50  avgt   10  108.978 ± 0.269  ns/op
ArraysHashCode.ints      60  avgt   10  126.010 ± 0.064  ns/op
ArraysHashCode.ints      70  avgt   10  146.115 ± 0.252  ns/op
ArraysHashCode.ints      80  avgt   10  163.453 ± 0.078  ns/op
ArraysHashCode.ints      90  avgt   10  184.433 ± 0.256  ns/op
ArraysHashCode.ints     100  avgt   10  201.002 ± 0.036  ns/op
ArraysHashCode.ints     200  avgt   10  388.929 ± 0.254  ns/op
ArraysHashCode.ints     300  avgt   10  577.083 ± 0.325  ns/op

And it's still good on other hardware mentioned earlier.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3088680935


More information about the hotspot-compiler-dev mailing list