RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v10]
ArsenyBochkarev
duke at openjdk.org
Thu Jun 13 11:27:50 UTC 2024
On Thu, 6 Jun 2024 01:57:08 GMT, Fei Yang <fyang at openjdk.org> wrote:
>> ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision:
>>
>> - Fix vrsub_vi for case of vlen > 128
>> - Add process_bytes_by32 function
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5168:
>
>> 5166: void adler32_process_bytes_by16(Register buff, Register s1, Register s2, Register right_16_bits,
>> 5167: VectorRegister vtable, VectorRegister vzero, VectorRegister *vbytes, VectorRegister *vs1acc, VectorRegister *vs2acc,
>> 5168: Register temp0, Register temp1, Register temp2, VectorRegister vtemp1, VectorRegister vtemp2, int LMUL) {
>
> Let's remove this `LMUL` param as all the callsites now passes value 1.
>
> Question: Did you consider unifying adler32_process_bytes_by16/32/64 into one function with one extra param indicating the size? Seems to me that they duplicate most of the code. And I guess there should be no big difference for the 16 variant to do vector-widening reduction sum at the end just like the other two?
>
> BTW: I can help test the performance difference as I have just added Banana-PI into my RV testing army.
Thanks, I unified the code for all steps sizes. Can you please do a re-run on Banana PI for performance?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1638044709
More information about the hotspot-compiler-dev
mailing list