RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v2]

Wed Dec 20 02:33:54 UTC 2023

On Fri, 15 Dec 2023 11:49:50 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

>> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version.
>> 
>> ### Correctness checks
>> 
>> Tier 1/2 tests are ok.
>> 
>> ### Performance results on T-Head board
>> 
>> #### Results for enabled intrinsic:
>> 
>> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java`
>> 
>> | Benchmark                                             |  (count) |  Mode | Cnt    | Score |   Error |  Units |
>> | --- | ---- | ----- | --- | ---- | --- | ---- |
>> | CRC32.TestCRC32.testCRC32Update  |     64  | thrpt     | 24 | 3730.929 | 37.773 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |    128 |  thrpt    | 24 | 2126.673 |  2.032 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |    256 | thrpt    |  24 | 1134.330 |  6.714 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |    512 | thrpt    |  24 |  584.017 |  2.267 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |   2048 |  thrpt   |   24 |  151.173 |  0.346 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |   16384 | thrpt |  24 |   19.113 |  0.008 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |  65536 | thrpt  | 24  |   4.647 | 0.022 | ops/ms |
>> 
>> #### Results for disabled intrinsic:
>> 
>> | Benchmark                                            | (count)  |  Mode | Cnt |   Score  |  Error   | Units     |
>> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | 
>> | CRC32.TestCRC32.testCRC32Update |      64    |  thrpt   | 15  | 798.365 | 35.486 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |     128   |  thrpt   | 15  | 677.756 | 46.619 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |     256   |  thrpt   | 15  | 552.781 | 27.143 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |     512   |  thrpt   | 15  | 429.304 | 12.518 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |    2048  |  thrpt   | 15  | 166.738 |  0.935  | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |   16384 |  thrpt   | 15  |  25.060  | 0.034   | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |   65536 |  thrpt   | 15  |   6.196   | 0.030   | ops/ms |
>
> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - Use zero_extend instead of shifts where possible
>  - Use andn instead of notr + andr where possible
>  - Replace shNadd with one instruction in most cases

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3717:

> 3715:   andi(tmp1, v, bits8);
> 3716:   shadd(tmp1, tmp1, table3, tmp2, 2);
> 3717:   Assembler::lwu(crc, tmp1, 0);

Why not use `MacroAssembler::lwu` instead ? I see low difference in stub code emitted.
Like:
``` diff

diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index 06026b98bfa..eb9362ca531 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -3696,26 +3696,26 @@ void MacroAssembler::update_word_crc32(Register crc, Register v, Register tmp1,
 
   andi(tmp1, v, bits8);
   shadd(tmp1, tmp1, table3, tmp2, 2);
-  Assembler::lwu(crc, tmp1, 0);
+  lwu(crc, Address(tmp1, 0));
 
   srli(tmp1, v, 6);
   andi(tmp1, tmp1, (bits8 << 2));
   add(tmp1, tmp1, table2);
-  Assembler::lwu(tmp2, tmp1, 0);
+  lwu(tmp2, Address(tmp1, 0));
 
   srli(tmp1, v, 14);
   xorr(crc, crc, tmp2);
 
   andi(tmp1, tmp1, (bits8 << 2));
   add(tmp1, tmp1, table1);
-  Assembler::lwu(tmp2, tmp1, 0);
+  lwu(tmp2, Address(tmp1, 0));
 
   srli(tmp1, v, 22);
   xorr(crc, crc, tmp2);
 
   andi(tmp1, tmp1, (bits8 << 2));
   add(tmp1, tmp1, table0);
-  Assembler::lwu(tmp2, tmp1, 0);
+  lwu(tmp2, Address(tmp1, 0));
   xorr(crc, crc, tmp2);
 }

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1432182715