RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v11]
Hamlin Li
mli at openjdk.org
Tue Apr 2 16:33:11 UTC 2024
On Tue, 2 Apr 2024 16:07:27 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:
>> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version.
>>
>> ### Correctness checks
>>
>> Tier 1/2 tests are ok.
>>
>> ### Performance results on T-Head board
>>
>> #### Results for enabled intrinsic:
>>
>> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java`
>>
>> | Benchmark | (count) | Mode | Cnt | Score | Error | Units |
>> | --- | ---- | ----- | --- | ---- | --- | ---- |
>> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms |
>>
>> #### Results for disabled intrinsic:
>>
>> | Benchmark | (count) | Mode | Cnt | Score | Error | Units |
>> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- |
>> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms |
>
> ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision:
>
> - Schedule instructions better
> - Fix crc32.h path
Thanks for updating and continuous refinement.

Seems the performance gain (last column in the picture) introduced by intrinsic is getting less and less when the data size increasing.
So IMHO, when data size is big enough, it brings performance regression rather than performance gain.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-2032522714
More information about the hotspot-compiler-dev
mailing list