RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v17]

Tue Jul 16 11:11:10 UTC 2024

On Tue, 16 Jul 2024 06:49:12 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Use t2 instead of count as scratch register
>>  - Remove blt after by16_loop_unroll
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5048:
> 
>> 5046:     VectorRegister vzero, VectorRegister vbytes, VectorRegister vs1acc, VectorRegister vs2acc,
>> 5047:     Register temp0, Register temp1, Register temp2,  Register temp3,
>> 5048:     VectorRegister vtemp1, VectorRegister vtemp2, int step, Assembler::LMUL LMUL) {
> 
> Better to use small ‘lmul’ and `lmulx2` for `LMUL` and `LMULx2` respectively to be consistent in naming style.

Done

> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5277:
> 
>> 5275:     __ sub(len, len, step_64);
>> 5276:     // By now the count should still be 64
>> 5277:     __ bge(len, count, L_by16_loop_unroll);
> 
> Code comment and `count` here needs update as well when you change to `t2`.

Done modulo one minor note: I changed my mind on `t2` and used `temp3` instead (as it is known alias)

> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5290:
> 
>> 5288:     __ bltz(len, L_do_mod);
>> 5289: 
>> 5290:   __ bind(L_by1_loop);
> 
> The loop body of `L_by1_loop` and `L_simple_by1_loop` looks the same except for the branch at the end. Could we eliminate `L_simple_by1_loop` and jump to `L_by1_loop` instead? Seems the only need is to substract `len` by one before the jump.

Done!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1679204213
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1679204177
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1679204163