RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v11]

Hamlin Li mli at openjdk.org
Wed Apr 3 11:03:10 UTC 2024


On Tue, 2 Apr 2024 16:07:27 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

>> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version.
>> 
>> ### Correctness checks
>> 
>> Tier 1/2 tests are ok.
>> 
>> ### Performance results on T-Head board
>> 
>> #### Results for enabled intrinsic:
>> 
>> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java`
>> 
>> | Benchmark                                             |  (count) |  Mode | Cnt    | Score |   Error |  Units |
>> | --- | ---- | ----- | --- | ---- | --- | ---- |
>> | CRC32.TestCRC32.testCRC32Update  |     64  | thrpt     | 24 | 3730.929 | 37.773 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |    128 |  thrpt    | 24 | 2126.673 |  2.032 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |    256 | thrpt    |  24 | 1134.330 |  6.714 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |    512 | thrpt    |  24 |  584.017 |  2.267 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |   2048 |  thrpt   |   24 |  151.173 |  0.346 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |   16384 | thrpt |  24 |   19.113 |  0.008 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |  65536 | thrpt  | 24  |   4.647 | 0.022 | ops/ms |
>> 
>> #### Results for disabled intrinsic:
>> 
>> | Benchmark                                            | (count)  |  Mode | Cnt |   Score  |  Error   | Units     |
>> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | 
>> | CRC32.TestCRC32.testCRC32Update |      64    |  thrpt   | 15  | 798.365 | 35.486 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |     128   |  thrpt   | 15  | 677.756 | 46.619 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |     256   |  thrpt   | 15  | 552.781 | 27.143 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |     512   |  thrpt   | 15  | 429.304 | 12.518 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |    2048  |  thrpt   | 15  | 166.738 |  0.935  | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |   16384 |  thrpt   | 15  |  25.060  | 0.034   | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |   65536 |  thrpt   | 15  |   6.196   | 0.030   | ops/ms |
>
> ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Schedule instructions better
>  - Fix crc32.h path

Thanks for updating.
Seems fine, but I'm not sure. Maybe see how others think about it.

Just FYI, as the trend of performance gain in this implementation is less and less as the data size grow larger, so I wonder if the CRC algorithm used in this implementation is optimal enough. Seems there're other more advanced algorithms which are supposed to bring more optimistic performance gains, and some of these algorithms are already implemented on other platforms in jdk.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-2034256054


More information about the hotspot-compiler-dev mailing list