RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v10]

ArsenyBochkarev duke at openjdk.org
Tue Apr 2 16:07:28 UTC 2024


On Tue, 2 Apr 2024 14:33:30 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Use srliw to clear upper bits for 'lower' cases
>
> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1365:
> 
>> 1363:   shadd(tmp1, tmp1, table0, tmp1, 2);
>> 1364:   lwu(tmp2, Address(tmp1));
>> 1365:   xorr(crc, crc, tmp2);
> 
> I witnessed slightly better JMH numbers on Lichee-PI-4A with the following sequence:
> 
>   if (upper)
>     srli(v, v, 32);
>   xorr(v, v, crc);
> 
>   andi(tmp1, v, right_8_bits);
>   shadd(tmp1, tmp1, table3, tmp2, 2);
>   lwu(crc, Address(tmp1));
> 
>   srli(tmp1, v, 6);
>   andi(tmp1, tmp1, (right_8_bits << 2));
>   add(tmp1, tmp1, table2);
>   lwu(tmp2, Address(tmp1));
> 
>   srli(tmp1, v, 14);
>   andi(tmp1, tmp1, (right_8_bits << 2));
>   add(tmp1, tmp1, table1);
>   xorr(crc, crc, tmp2);
> 
>   lwu(tmp2, Address(tmp1));
>   srliw(tmp1, v, 24);
>   shadd(tmp1, tmp1, table0, tmp1, 2);
>   xorr(crc, crc, tmp2);
> 
>   lwu(tmp2, Address(tmp1));
>   xorr(crc, crc, tmp2);

Thanks for pointing it out! Also some additional accel can be achieved by using `srli` for '`upper`' cases: 

On Lichee-Pi:
`srliw` only
| Benchmark                      |  (count) |  Mode | Cnt   |  Score  |   Error |  Units |
| --------------------------------- | ------ | ----------- | --------- | -------- | --------- | ------ |
| CRC32.TestCRC32.testCRC32Update  |     64 | thrpt  | 12 | 6512.348 | 146.138 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |    128 | thrpt |  12 | 3408.306 | 279.986 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |    256 | thrpt |  12 | 1971.538 | 100.804 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |    512 | thrpt |  12 | 1040.091 |   3.426 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |   2048 | thrpt |  12 |  272.233 |   3.844 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |  16384 | thrpt |  12 |   33.781 |   1.961 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |  65536 | thrpt  | 12  |   8.399 |   0.042 | ops/ms |

`srli` and `srliw`
| Benchmark   |                     (count) |  Mode | Cnt |    Score   |  Error |  Units |
| ----------------- | ---------------------------- | ------ | ----- | ------------ | -------- | -------- |
| CRC32.TestCRC32.testCRC32Update  |     64 | thrpt  | 12 | 6561.674 | 104.461 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |    128 | thrpt |  12 | 3586.810 | 109.934 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |    256 | thrpt  | 12 | 2024.515 |  16.118 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |    512  | thrpt  | 12 | 1047.475 |  39.745 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |   2048 | thrpt |  12  | 274.006 |   0.809 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |  16384 | thrpt  | 12  |  34.746 |   0.203 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |  65536 | thrpt  | 12   |  8.405 |  0.064 | ops/ms |

> src/hotspot/cpu/riscv/stubRoutines_riscv.cpp line 60:
> 
>> 58: 
>> 59: /**
>> 60:  *  crc_table[] from jdk/src/share/native/java/util/zip/zlib-1.2.5/crc32.h
> 
> I think the correct path should be: `jdk/src/java.base/share/native/libzip/zlib/crc32.h`

Fixed, thanks!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1548163980
PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1548164055


More information about the hotspot-compiler-dev mailing list