RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v9]

ArsenyBochkarev duke at openjdk.org
Tue Mar 19 11:53:26 UTC 2024


On Tue, 19 Mar 2024 11:16:42 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

>> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version.
>> 
>> ### Correctness checks
>> 
>> Tier 1/2 tests are ok.
>> 
>> ### Performance results on T-Head board
>> 
>> #### Results for enabled intrinsic:
>> 
>> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java`
>> 
>> | Benchmark                                             |  (count) |  Mode | Cnt    | Score |   Error |  Units |
>> | --- | ---- | ----- | --- | ---- | --- | ---- |
>> | CRC32.TestCRC32.testCRC32Update  |     64  | thrpt     | 24 | 3730.929 | 37.773 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |    128 |  thrpt    | 24 | 2126.673 |  2.032 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |    256 | thrpt    |  24 | 1134.330 |  6.714 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |    512 | thrpt    |  24 |  584.017 |  2.267 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |   2048 |  thrpt   |   24 |  151.173 |  0.346 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |   16384 | thrpt |  24 |   19.113 |  0.008 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update  |  65536 | thrpt  | 24  |   4.647 | 0.022 | ops/ms |
>> 
>> #### Results for disabled intrinsic:
>> 
>> | Benchmark                                            | (count)  |  Mode | Cnt |   Score  |  Error   | Units     |
>> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | 
>> | CRC32.TestCRC32.testCRC32Update |      64    |  thrpt   | 15  | 798.365 | 35.486 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |     128   |  thrpt   | 15  | 677.756 | 46.619 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |     256   |  thrpt   | 15  | 552.781 | 27.143 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |     512   |  thrpt   | 15  | 429.304 | 12.518 | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |    2048  |  thrpt   | 15  | 166.738 |  0.935  | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |   16384 |  thrpt   | 15  |  25.060  | 0.034   | ops/ms |
>> | CRC32.TestCRC32.testCRC32Update |   65536 |  thrpt   | 15  |   6.196   | 0.030   | ops/ms |
>
> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Optimize last 'upper' load in update_word_crc32

I managed to get some additional acceleration for cases with `Zba` enabled. Updated data for StarFive VisionFive2:

| Benchmark                 |       (count) |   Mode | Cnt  |   Score  |  Error |  Units |
| ---------------------------- | --------------- | --------- | ----- | ----------- | -------- | --------- |
| CRC32.TestCRC32.testCRC32Update  |     64 | thrpt |  12 | 4231.837 | 12.249 | ops/ms
| CRC32.TestCRC32.testCRC32Update  |    128 | thrpt |  12 | 2678.843 |  1.631 | ops/ms
| CRC32.TestCRC32.testCRC32Update  |    256 | thrpt |  12 | 1405.024 |  6.509 | ops/ms
| CRC32.TestCRC32.testCRC32Update  |    512 | thrpt  | 12 |  727.608 |  1.393 | ops/ms
| CRC32.TestCRC32.testCRC32Update  |   2048 | thrpt  | 12  | 186.552 |   0.389 | ops/ms
| CRC32.TestCRC32.testCRC32Update  |  16384 | thrpt |  12  |  23.423 |   0.087 | ops/ms
| CRC32.TestCRC32.testCRC32Update  |  65536 | thrpt |  12   |  5.493 |   0.015 | ops/ms

Results for disabled intrinsic are [here](https://github.com/openjdk/jdk/pull/17046#issuecomment-1850364667)

> performance numbers on unmatched

Current data for HiFive Unmatched (no `Zba` here!):

Enabled intrinsic:
| Benchmark                   |     (count) |  Mode | Cnt   |  Score  |  Error  | Units |
| -------------------------------- | ---------- | -------- | ------   | ------- | ----------- | ------- |
| CRC32.TestCRC32.testCRC32Update  |     64 | thrpt |  12 | 3180.082 | ± 63.442 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |    128 | thrpt |  12 | 1936.728 | ± 17.332 |  ops/ms |
| CRC32.TestCRC32.testCRC32Update  |    256 | thrpt |  12 | 1019.500 | ±  5.038 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |    512 | thrpt  | 12 |  527.775 | ±  2.059 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |   2048 | thrpt  | 12  | 135.190 | ±  0.279 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  |  16384 | thrpt |  12  |  16.996 | ±  0.066 |  ops/ms |
| CRC32.TestCRC32.testCRC32Update  |  65536 | thrpt |  12   |  3.877 | ±  0.011 | ops/ms |

Disabled intrinsic:

| Benchmark                       | (count) |  Mode  | Cnt  |  Score  |  Error |  Units |
| ------ | ------------ | ----------- | -------- | -------- |-------- | ----- |
| CRC32.TestCRC32.testCRC32Update |      64 | thrpt |  12 | 992.300 | ± 17.666 | ops/ms |
| CRC32.TestCRC32.testCRC32Update |     128 | thrpt |  12 | 818.234 | ±  9.767 | ops/ms |
| CRC32.TestCRC32.testCRC32Update |     256 | thrpt  | 12 | 605.509 | ± 14.685 | ops/ms |
| CRC32.TestCRC32.testCRC32Update |     512 | thrpt  | 12 | 402.414 | ±  4.331 | ops/ms |
| CRC32.TestCRC32.testCRC32Update |    2048 | thrpt |  12 | 134.390 | ±  1.399 | ops/ms |
| CRC32.TestCRC32.testCRC32Update |   16384 | thrpt  | 12 |  18.619 | ±  0.104 | ops/ms |
| CRC32.TestCRC32.testCRC32Update |   65536 | thrpt  | 12 |   4.229 | ±  0.020 | ops/ms |

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-2006979085


More information about the hotspot-compiler-dev mailing list