RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v10]

Thu Jun 13 11:27:50 UTC 2024

On Thu, 6 Jun 2024 01:57:08 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Fix vrsub_vi for case of vlen > 128
>>  - Add process_bytes_by32 function
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5168:
> 
>> 5166:   void adler32_process_bytes_by16(Register buff, Register s1, Register s2, Register right_16_bits,
>> 5167:     VectorRegister vtable, VectorRegister vzero, VectorRegister *vbytes, VectorRegister *vs1acc, VectorRegister *vs2acc, 
>> 5168:     Register temp0, Register temp1, Register temp2, VectorRegister vtemp1, VectorRegister vtemp2, int LMUL) {
> 
> Let's remove this `LMUL` param as all the callsites now passes value 1. 
> 
> Question: Did you consider unifying adler32_process_bytes_by16/32/64 into one function with one extra param indicating the size? Seems to me that they duplicate most of the code. And I guess there should be no big difference for the 16 variant to do vector-widening reduction sum at the end just like the other two?
> 
> BTW: I can help test the performance difference as I have just added Banana-PI into my RV testing army.

Thanks, I unified the code for all steps sizes. Can you please do a re-run on Banana PI for performance?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1638044709