RFR: 8321008: RISC-V: C2 MulAddVS2VI [v2]

Mon Apr 29 15:28:09 UTC 2024

On Mon, 29 Apr 2024 13:31:18 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/riscv_v.ad line 898:
>> 
>>> 896: 
>>> 897:     __ vmul_vv(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg), as_VectorRegister($tmp2$$reg));
>>> 898:     __ vmacc_vv(as_VectorRegister($dst$$reg), as_VectorRegister($tmp1$$reg), as_VectorRegister($tmp3$$reg));
>> 
>> Hmm ... This doesn't look like a simple/straightforward sequence, isn't it? It's hard to tell whether we will benifit from this change without JMH testing on real RVV hardwares especially when VLEN is not large (At least no big difference in respect of number of instructions executed when VLEN=128-bits).
>
> You're right.
> I'm waiting for my board to test it. I'll update when I get the data.

The general advantage over no intrinsic is that we avoid the spilling from the vector register to the heap in case it falls back to whatever the C2 compiler would compile to. That avoids hit to the L1 or load/store CPU pipe even. I agree, we'll need runs with JMH on actual hardware to verify that it's indeed a win.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18919#discussion_r1583276523