RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v8]
ArsenyBochkarev
duke at openjdk.org
Mon Jun 3 19:17:08 UTC 2024
On Mon, 27 May 2024 12:07:13 GMT, Fei Yang <fyang at openjdk.org> wrote:
>> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision:
>>
>> - Partially unroll L_by16_loop
>> - Fix by64 function for vlen > 128
>> - Fix by16 function for vlen > 128
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5069:
>
>> 5067:
>> 5068: // Load data
>> 5069: __ vsetvli(temp0, count, Assembler::e8, Assembler::m4);
>
> Maybe add a simple assertion about `count` before this to make sure that it equals 64 on entry? Or let this function initialize `count` to 64, which I guess won't impact performance much.
I see. There are no way to check this at compile time, so I added `count` initialization to 64 in `adler32_process_bytes_by64`
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5083:
>
>> 5081: // Summing up calculated results for s2_new
>> 5082: __ vsetvli(temp0, count, Assembler::e16, Assembler::m4);
>> 5083: // 0xFF * 0x10 = 0xFF0 max per single vector element,
>
> I don't quite understand this line of code comment. What does `0x10` here stands for?
These comment lines were about upper bounds for reduction sum. However, I saw that they were not correct, thanks for pointing it out! Actually the correct one is `0xFF * (64 + 63 + ... + 2 + 1) = 0x817E0` per whole 4-register-size group
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5096:
>
>> 5094: // Extracting results for:
>> 5095: // s1_new
>> 5096: __ vmv_x_s(temp0, vs1acc[0]);
>
> Note that `vmv_x_s` will sign-extend the `e16` reduction result in `vs1acc[0]`. Is that safe?
It is safe, yes. Left additional comment to clarify it up the code
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5207:
>
>> 5205: Register step = x28; // t3
>> 5206:
>> 5207: VectorRegister vzero = v4; // group: v5, v6, v7
>
> I see `vzero` is only used as the scalar source for vector integer reduction instructions, so it's not necessary for `vzero` to be a group of: v4, v5, v6, v7. Seems that we can assign the final v31 for `vzero` and thus free vector register group of v4, v5, v6, v7.
>
> And here is what the RVV spec says for reference:
>
> Vector reduction operations take a vector register group of elements and a scalar held in element 0
> of a vector register, and perform a reduction using some binary operator, to produce a scalar result
> in element 0 of a vector register. The scalar input and output operands are held in element 0 of a
> single vector register, not a vector register group, so any vector register can be the scalar source
> or destination of a vector reduction regardless of LMUL setting.
Done, thanks
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5217:
>
>> 5215: v16, v18, v20, v22
>> 5216: };
>> 5217: VectorRegister vtable_64 = v24; // group: v25, v26, v27
>
> Suggestion: `// group: v24, v25, v26, v27`
Fixed
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5220:
>
>> 5218: VectorRegister vtable_16 = (MaxVectorSize == 16) ? v27 : v30;
>> 5219: VectorRegister vtemp1 = v28; // group: v29, v30, v31
>> 5220: VectorRegister vtemp2 = v29;
>
> Similar for `vtemp1` and `vtemp2` which are only used as the scalar destination for vector integer reduction instructions: it's not necessary for them to be a vector register group. So you might want to remove the code comment for `vtemp1`.
Done
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5234:
>
>> 5232: __ vid_v(vtemp1);
>> 5233: __ vmv_v_x(vtable_64, temp1);
>> 5234: __ vsub_vv(vtable_64, vtable_64, vtemp1);
>
> I think a more simpler `vrsub_vx vtable_64, vtemp1, temp1` will do? This will help save the `vmv_v_x` instruction.
Thanks for pointing out, done!
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5245:
>
>> 5243: __ vid_v(vtemp1);
>> 5244: __ vmv_v_x(vtable_16, temp1);
>> 5245: __ vsub_vv(vtable_16, vtable_16, vtemp1);
>
> Similar here: `vrsub_vx vtable_16, vtemp1, temp1`.
Done
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1624927998
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1624927850
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1624927907
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1624928180
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1624928198
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1624927931
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1624928147
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1624928135
More information about the hotspot-compiler-dev
mailing list