RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v2]

Hamlin Li mli at openjdk.org
Thu Dec 21 15:40:51 UTC 2023


On Wed, 20 Dec 2023 15:34:40 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3717:
>> 
>>> 3715:   andi(tmp1, v, bits8);
>>> 3716:   shadd(tmp1, tmp1, table3, tmp2, 2);
>>> 3717:   Assembler::lwu(crc, tmp1, 0);
>> 
>> Why not use `MacroAssembler::lwu` instead ? I see no difference in stub code emitted.
>> Like:
>> ``` diff
>> diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
>> index 06026b98bfa..eb9362ca531 100644
>> --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
>> +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
>> @@ -3696,26 +3696,26 @@ void MacroAssembler::update_word_crc32(Register crc, Register v, Register tmp1,
>>  
>>    andi(tmp1, v, bits8);
>>    shadd(tmp1, tmp1, table3, tmp2, 2);
>> -  Assembler::lwu(crc, tmp1, 0);
>> +  lwu(crc, Address(tmp1, 0));
>>  
>>    srli(tmp1, v, 6);
>>    andi(tmp1, tmp1, (bits8 << 2));
>>    add(tmp1, tmp1, table2);
>> -  Assembler::lwu(tmp2, tmp1, 0);
>> +  lwu(tmp2, Address(tmp1, 0));
>>  
>>    srli(tmp1, v, 14);
>>    xorr(crc, crc, tmp2);
>>  
>>    andi(tmp1, tmp1, (bits8 << 2));
>>    add(tmp1, tmp1, table1);
>> -  Assembler::lwu(tmp2, tmp1, 0);
>> +  lwu(tmp2, Address(tmp1, 0));
>>  
>>    srli(tmp1, v, 22);
>>    xorr(crc, crc, tmp2);
>>  
>>    andi(tmp1, tmp1, (bits8 << 2));
>>    add(tmp1, tmp1, table0);
>> -  Assembler::lwu(tmp2, tmp1, 0);
>> +  lwu(tmp2, Address(tmp1, 0));
>>    xorr(crc, crc, tmp2);
>>  }
>
> When I tried `MacroAssembler::lwu` I got the following instructions on T-head:
> 
> 0.47%  ?  0x0000003fac6a8738:   li	t3,1
> 0.51%  ?  0x0000003fac6a873a:   slli	t3,t3,0x20
> 0.00%  ?  0x0000003fac6a873c:   addi	t3,t3,-1
> ...
> 2.68%  ?  0x0000003fac6a8752:   lw	a0,0(t1)
> 5.25%  ?  0x0000003fac6a8756:   and	a0,a0,t3
> ...
>        ?  0x0000003fac6a876a:   lw	t4,0(t1)
> 1.78%  ?  0x0000003fac6a876e:   and	t1,t4,t3
> ...
> 0.49%  ?  0x0000003fac6a8786:   lw	t4,0(t1)
> 2.62%  ?  0x0000003fac6a878a:   and	t1,t4,t3
> ...
> 0.41%  ?  0x0000003fac6a87a2:   lw	t4,0(t1)
> 3.97%  ?  0x0000003fac6a87a6:   and	t1,t4,t3
> 
> instead of just 
> 
> 4.52%  ??  0x0000003fb49e96f6:   lwu	a0,0(t1)
> ...
>        ??  0x0000003fb49e970a:   lwu	t3,0(t1)
> ...
>        ??  0x0000003fb49e9722:   lwu	t3,0(t1)
> ...
> 0.02%  ??  0x0000003fb49e973a:   lwu	t3,0(t1)

Interesting, I tried on qemu and `T-HEAD Light Lichee Pi 4A`, I don't get this code generated with just `lwu(tmp2, Address(tmp1, 0));`.
Do you know how does it happen? I mean how does this happen on a specific hardware only.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1434221095


More information about the hotspot-compiler-dev mailing list