RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v11]

ArsenyBochkarev duke at openjdk.org
Tue Apr 2 19:01:10 UTC 2024


On Tue, 2 Apr 2024 16:30:27 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Schedule instructions better
>>  - Fix crc32.h path
>
> Thanks for updating and continuous refinement.
> 
> ![image](https://github.com/openjdk/jdk/assets/10797965/4ed5ccc7-cc4b-431d-8c7d-ae829bd2c43b)
> 
> Seems the performance gain (last column in the picture) introduced by intrinsic is getting less and less when the data size increasing.
> So IMHO, when data size is big enough, it brings performance regression rather than performance gain.

@Hamlin-Li I modified `test/micro/org/openjdk/bench/java/util/TestCRC32C.java` a bit to see if the regression happens on increased data, and run it on VisionFive2 with Zba enabed:

Enabled intrinsic
| Benchmark                       | (count)  | Mode |  Cnt | Score |  Error  | Units |
| ------------------------------- | ----------- | ----------- | ----- | ------- | ------ | ---------- |
| CRC32.TestCRC32.testCRC32Update |  131072 | thrpt  | 40 | 2.841 | 0.001 | ops/ms |
| CRC32.TestCRC32.testCRC32Update |  262144 | thrpt  | 40 | 1.420 | 0.001 | ops/ms |
| CRC32.TestCRC32.testCRC32Update |  524288 | thrpt  | 40 | 0.709 | 0.001 | ops/ms |
| CRC32.TestCRC32.testCRC32Update  | 2097152 |  thrpt  | 40 | 0.176 |  0.001 | ops/ms |

Disabled intrinsic
| Benchmark                       | (count)  | Mode |  Cnt | Score |  Error  | Units | 
| ------------------------------- | ----------- | ----------- | ----- | ------- | ------ | ---------- |
| CRC32.TestCRC32.testCRC32Update  | 131072 | thrpt  | 40 |  2.729 |  0.003 | ops/ms | 
| CRC32.TestCRC32.testCRC32Update |  262144 |  thrpt  | 40  | 1.367 |  0.001 | ops/ms |
| CRC32.TestCRC32.testCRC32Update |  524288 | thrpt |  40 | 0.684 |  0.001 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 2097152 | thrpt  | 40 | 0.170 |  0.001 | ops/ms | 

| (count) | enabled/disabled |
| ------------------ | ------------------ | 
| 131072 | 1,041040674 |
| 262144 | 1,038771031 |
| 524288 |  1.036549708 |
| 2097152 | 1,035294118 |

So since there are no regressions compared to C2-generated code with `-XX:+UseZba`, how about making the intrinsic Zba-exclusive?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-2032833530


More information about the hotspot-compiler-dev mailing list