RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v2]
Hamlin Li
mli at openjdk.org
Thu Dec 21 11:40:52 UTC 2023
On Fri, 15 Dec 2023 11:49:50 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:
>> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version.
>>
>> ### Correctness checks
>>
>> Tier 1/2 tests are ok.
>>
>> ### Performance results on T-Head board
>>
>> #### Results for enabled intrinsic:
>>
>> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java`
>>
>> | Benchmark | (count) | Mode | Cnt | Score | Error | Units |
>> | --- | ---- | ----- | --- | ---- | --- | ---- |
>> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms |
>>
>> #### Results for disabled intrinsic:
>>
>> | Benchmark | (count) | Mode | Cnt | Score | Error | Units |
>> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- |
>> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms |
>
> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision:
>
> - Use zero_extend instead of shifts where possible
> - Use andn instead of notr + andr where possible
> - Replace shNadd with one instruction in most cases
Some minor comments
src/hotspot/cpu/riscv/c1_LIRGenerator_riscv.cpp line 775:
> 773:
> 774: void LIRGenerator::do_update_CRC32(Intrinsic* x) {
> 775: assert(UseCRC32Intrinsics, "why are we here?");
I suppose the performance data is for C2 intrinsic, is there performance data for C1?
src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3719:
> 3717: Assembler::lwu(crc, tmp1, 0);
> 3718:
> 3719: srli(tmp1, v, 6);
The comment at the beginning is `crc = table3[v&0xff]^table2[(v>>8)&0xff]^table1[(v>>16)&0xff]^table0[v>>24]`, i.e. shifts of v are `6/14/22`, but in the code here shift is `6/14/22` + `bits8 << 2`, is this intended? Can you add some comments here?
src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3760:
> 3758: add(table1, table0, 1*256*sizeof(juint), tmp);
> 3759: add(table2, table0, 2*256*sizeof(juint), tmp);
> 3760: add(table3, table0, 3*256*sizeof(juint), tmp);
With `add(table3, table2, 256*(junit))`, it might save one instruction.
src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3762:
> 3760: add(table3, table0, 3*256*sizeof(juint), tmp);
> 3761:
> 3762: bind(L_by16);
seems `L_by16` is not necessary.
-------------
PR Review: https://git.openjdk.org/jdk/pull/17046#pullrequestreview-1792785890
PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1433950783
PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1433950997
PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1433951041
PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1433951122
More information about the hotspot-compiler-dev
mailing list