RFR: 8319716: RISC-V: Add SHA-2 [v8]

Fri Dec 22 12:43:52 UTC 2023

On Fri, 22 Dec 2023 11:55:04 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3935:
>> 
>>> 3933:       // x0 is not written, we known the number of vector elements.
>>> 3934: 
>>> 3935:       __ vsetivli(x0, 4, vset_sew, Assembler::m1, Assembler::ma, Assembler::ta);
>> 
>> Shouldn't we set `LMUL` to `Assembler::m2` when `vset_sew` is `e64`? A single 128-bit wide vector register won't be able to hold 4 `e64` elements. I guess you were testing against 256-bit wide vector register so that this issue won't trigger? This also means most of the code won't work for sha512 with 128-bit RVV as `Assembler::m2` means register group of two. This menifests in the corresponding openssl sha512 implementation [1].
>> 
>> (With that obervation, should we have two independent code for each algorithm like the openssl version? That seems more readable and maintainable to me.)
>> 
>> [1] https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha512-riscv64-zvkb-zvknhb.pl
>
> The sha instruction don't care about lmul, except only in that lmul*vlen == 256.
> So there is no code difference except just setting m2 in case of 128.
> These crypto uses 'EGW' element group width which is specified.
> As quad rounds depend on result from previous quad round, you can only do 4xe64(for 256 e32), no more no less.
> 
> Yes, good catch! I must remember to tests multiple vlens. Thanks!

I suppose we would need some more if conditions to distinguish the register usage for the sha512 case in places like indexed load/store, message block load, etc. Because we will work with vector register pairs as indicated by `m2` for this case. I am not sure if that's a good way.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1435022104