RFR: 8316592: RISC-V: implement poly1305 intrinsic [v3]

Wed Nov 1 18:51:38 UTC 2023

On Wed, 1 Nov 2023 14:10:42 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> null has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Replace slli/add sequence with shadd instruction
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4492:
> 
>> 4490:     Label here;
>> 4491: 
>> 4492:     RegSet saved_regs = RegSet::range(x18, x25);
> 
> should x26/27 be saved too?

I don't think that's necessary. They're not used in this stub at all so we can save a couple of instructions by not storing and loading them.

> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4498:
> 
>> 4496: 
>> 4497:     // Arguments
>> 4498:     const Register input_start = *regs, length = *++regs, acc_start = *++regs, r_start = *++regs;
> 
> Can you add some comments before this method about?
> 1. the corresponding java method name being intrinsified.
> 2. the mapping between the passed-in java arguments and resgiters used to hold the parameter.
> And seems its more clear to use specific name such as c_rarg0/1 for `arguments` instead of this iterator. (For other registers, iterator is fine).

Explicitly specified argument registers. I'll do the rest a bit later.

> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4513:
> 
>> 4511:     const Register RR_0 = *++regs, RR_1 = *++regs;
>> 4512:     __ srli(tmp1, R_0, 2);
>> 4513:     __ shadd(RR_0, tmp1, tmp1, tmp2, 2);
> 
> As all shadd here are indeed sh2add, so can sh2add be used instead, so it saves using tmp2, and maybe we can save one register used in total by the method, the benefit is to make the code more readable in this case.

The `shadd` function also provides ability to do `slli` + `add` with no `Zba` enabled.

> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4546:
> 
>> 4544:       // partial products without any risk of needing to propagate a
>> 4545:       // carry out.
>> 4546:       wide_mul(U_0, U_0HI, S_0, R_0);  wide_madd(U_0, U_0HI, S_1, RR_1, tmp1, tmp2); wide_madd(U_0, U_0HI, S_2, RR_0, tmp1, tmp2);
> 
> I'm not sure why in aarch64 version it's written in this format, but would it be better to have one line for each instruction?

Fixed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1379178832
PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1379179230
PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1379178999
PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1379180530