RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v10]
ArsenyBochkarev
duke at openjdk.org
Tue Apr 2 16:07:28 UTC 2024
On Tue, 2 Apr 2024 14:33:30 GMT, Fei Yang <fyang at openjdk.org> wrote:
>> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Use srliw to clear upper bits for 'lower' cases
>
> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1365:
>
>> 1363: shadd(tmp1, tmp1, table0, tmp1, 2);
>> 1364: lwu(tmp2, Address(tmp1));
>> 1365: xorr(crc, crc, tmp2);
>
> I witnessed slightly better JMH numbers on Lichee-PI-4A with the following sequence:
>
> if (upper)
> srli(v, v, 32);
> xorr(v, v, crc);
>
> andi(tmp1, v, right_8_bits);
> shadd(tmp1, tmp1, table3, tmp2, 2);
> lwu(crc, Address(tmp1));
>
> srli(tmp1, v, 6);
> andi(tmp1, tmp1, (right_8_bits << 2));
> add(tmp1, tmp1, table2);
> lwu(tmp2, Address(tmp1));
>
> srli(tmp1, v, 14);
> andi(tmp1, tmp1, (right_8_bits << 2));
> add(tmp1, tmp1, table1);
> xorr(crc, crc, tmp2);
>
> lwu(tmp2, Address(tmp1));
> srliw(tmp1, v, 24);
> shadd(tmp1, tmp1, table0, tmp1, 2);
> xorr(crc, crc, tmp2);
>
> lwu(tmp2, Address(tmp1));
> xorr(crc, crc, tmp2);
Thanks for pointing it out! Also some additional accel can be achieved by using `srli` for '`upper`' cases:
On Lichee-Pi:
`srliw` only
| Benchmark | (count) | Mode | Cnt | Score | Error | Units |
| --------------------------------- | ------ | ----------- | --------- | -------- | --------- | ------ |
| CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 12 | 6512.348 | 146.138 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 12 | 3408.306 | 279.986 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 12 | 1971.538 | 100.804 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 12 | 1040.091 | 3.426 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 12 | 272.233 | 3.844 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 12 | 33.781 | 1.961 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 12 | 8.399 | 0.042 | ops/ms |
`srli` and `srliw`
| Benchmark | (count) | Mode | Cnt | Score | Error | Units |
| ----------------- | ---------------------------- | ------ | ----- | ------------ | -------- | -------- |
| CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 12 | 6561.674 | 104.461 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 12 | 3586.810 | 109.934 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 12 | 2024.515 | 16.118 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 12 | 1047.475 | 39.745 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 12 | 274.006 | 0.809 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 12 | 34.746 | 0.203 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 12 | 8.405 | 0.064 | ops/ms |
> src/hotspot/cpu/riscv/stubRoutines_riscv.cpp line 60:
>
>> 58:
>> 59: /**
>> 60: * crc_table[] from jdk/src/share/native/java/util/zip/zlib-1.2.5/crc32.h
>
> I think the correct path should be: `jdk/src/java.base/share/native/libzip/zlib/crc32.h`
Fixed, thanks!
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1548163980
PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1548164055
More information about the hotspot-compiler-dev
mailing list