RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v2]
ArsenyBochkarev
duke at openjdk.org
Wed Dec 20 15:37:50 UTC 2023
On Wed, 20 Dec 2023 02:30:40 GMT, Gui Cao <gcao at openjdk.org> wrote:
>> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision:
>>
>> - Use zero_extend instead of shifts where possible
>> - Use andn instead of notr + andr where possible
>> - Replace shNadd with one instruction in most cases
>
> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3717:
>
>> 3715: andi(tmp1, v, bits8);
>> 3716: shadd(tmp1, tmp1, table3, tmp2, 2);
>> 3717: Assembler::lwu(crc, tmp1, 0);
>
> Why not use `MacroAssembler::lwu` instead ? I see no difference in stub code emitted.
> Like:
> ``` diff
> diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
> index 06026b98bfa..eb9362ca531 100644
> --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
> +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
> @@ -3696,26 +3696,26 @@ void MacroAssembler::update_word_crc32(Register crc, Register v, Register tmp1,
>
> andi(tmp1, v, bits8);
> shadd(tmp1, tmp1, table3, tmp2, 2);
> - Assembler::lwu(crc, tmp1, 0);
> + lwu(crc, Address(tmp1, 0));
>
> srli(tmp1, v, 6);
> andi(tmp1, tmp1, (bits8 << 2));
> add(tmp1, tmp1, table2);
> - Assembler::lwu(tmp2, tmp1, 0);
> + lwu(tmp2, Address(tmp1, 0));
>
> srli(tmp1, v, 14);
> xorr(crc, crc, tmp2);
>
> andi(tmp1, tmp1, (bits8 << 2));
> add(tmp1, tmp1, table1);
> - Assembler::lwu(tmp2, tmp1, 0);
> + lwu(tmp2, Address(tmp1, 0));
>
> srli(tmp1, v, 22);
> xorr(crc, crc, tmp2);
>
> andi(tmp1, tmp1, (bits8 << 2));
> add(tmp1, tmp1, table0);
> - Assembler::lwu(tmp2, tmp1, 0);
> + lwu(tmp2, Address(tmp1, 0));
> xorr(crc, crc, tmp2);
> }
When I tried `MacroAssembler::lwu` I got the following instructions on T-head:
0.47% ? 0x0000003fac6a8738: li t3,1
0.51% ? 0x0000003fac6a873a: slli t3,t3,0x20
0.00% ? 0x0000003fac6a873c: addi t3,t3,-1
...
2.68% ? 0x0000003fac6a8752: lw a0,0(t1)
5.25% ? 0x0000003fac6a8756: and a0,a0,t3
...
? 0x0000003fac6a876a: lw t4,0(t1)
1.78% ? 0x0000003fac6a876e: and t1,t4,t3
...
0.49% ? 0x0000003fac6a8786: lw t4,0(t1)
2.62% ? 0x0000003fac6a878a: and t1,t4,t3
...
0.41% ? 0x0000003fac6a87a2: lw t4,0(t1)
3.97% ? 0x0000003fac6a87a6: and t1,t4,t3
instead of just
4.52% ?? 0x0000003fb49e96f6: lwu a0,0(t1)
...
?? 0x0000003fb49e970a: lwu t3,0(t1)
...
?? 0x0000003fb49e9722: lwu t3,0(t1)
...
0.02% ?? 0x0000003fb49e973a: lwu t3,0(t1)
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1432869971
More information about the hotspot-compiler-dev
mailing list