RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v9]
ArsenyBochkarev
duke at openjdk.org
Tue Mar 19 11:53:26 UTC 2024
On Tue, 19 Mar 2024 11:16:42 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:
>> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version.
>>
>> ### Correctness checks
>>
>> Tier 1/2 tests are ok.
>>
>> ### Performance results on T-Head board
>>
>> #### Results for enabled intrinsic:
>>
>> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java`
>>
>> | Benchmark | (count) | Mode | Cnt | Score | Error | Units |
>> | --- | ---- | ----- | --- | ---- | --- | ---- |
>> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms |
>>
>> #### Results for disabled intrinsic:
>>
>> | Benchmark | (count) | Mode | Cnt | Score | Error | Units |
>> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- |
>> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms |
>
> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:
>
> Optimize last 'upper' load in update_word_crc32
I managed to get some additional acceleration for cases with `Zba` enabled. Updated data for StarFive VisionFive2:
| Benchmark | (count) | Mode | Cnt | Score | Error | Units |
| ---------------------------- | --------------- | --------- | ----- | ----------- | -------- | --------- |
| CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 12 | 4231.837 | 12.249 | ops/ms
| CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 12 | 2678.843 | 1.631 | ops/ms
| CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 12 | 1405.024 | 6.509 | ops/ms
| CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 12 | 727.608 | 1.393 | ops/ms
| CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 12 | 186.552 | 0.389 | ops/ms
| CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 12 | 23.423 | 0.087 | ops/ms
| CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 12 | 5.493 | 0.015 | ops/ms
Results for disabled intrinsic are [here](https://github.com/openjdk/jdk/pull/17046#issuecomment-1850364667)
> performance numbers on unmatched
Current data for HiFive Unmatched (no `Zba` here!):
Enabled intrinsic:
| Benchmark | (count) | Mode | Cnt | Score | Error | Units |
| -------------------------------- | ---------- | -------- | ------ | ------- | ----------- | ------- |
| CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 12 | 3180.082 | ± 63.442 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 12 | 1936.728 | ± 17.332 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 12 | 1019.500 | ± 5.038 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 12 | 527.775 | ± 2.059 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 12 | 135.190 | ± 0.279 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 12 | 16.996 | ± 0.066 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 12 | 3.877 | ± 0.011 | ops/ms |
Disabled intrinsic:
| Benchmark | (count) | Mode | Cnt | Score | Error | Units |
| ------ | ------------ | ----------- | -------- | -------- |-------- | ----- |
| CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 12 | 992.300 | ± 17.666 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 12 | 818.234 | ± 9.767 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 12 | 605.509 | ± 14.685 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 12 | 402.414 | ± 4.331 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 12 | 134.390 | ± 1.399 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 12 | 18.619 | ± 0.104 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 12 | 4.229 | ± 0.020 | ops/ms |
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-2006979085
More information about the hotspot-compiler-dev
mailing list