RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v11]
ArsenyBochkarev
duke at openjdk.org
Tue Apr 2 19:01:10 UTC 2024
On Tue, 2 Apr 2024 16:30:27 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision:
>>
>> - Schedule instructions better
>> - Fix crc32.h path
>
> Thanks for updating and continuous refinement.
>
> ![image](https://github.com/openjdk/jdk/assets/10797965/4ed5ccc7-cc4b-431d-8c7d-ae829bd2c43b)
>
> Seems the performance gain (last column in the picture) introduced by intrinsic is getting less and less when the data size increasing.
> So IMHO, when data size is big enough, it brings performance regression rather than performance gain.
@Hamlin-Li I modified `test/micro/org/openjdk/bench/java/util/TestCRC32C.java` a bit to see if the regression happens on increased data, and run it on VisionFive2 with Zba enabed:
Enabled intrinsic
| Benchmark | (count) | Mode | Cnt | Score | Error | Units |
| ------------------------------- | ----------- | ----------- | ----- | ------- | ------ | ---------- |
| CRC32.TestCRC32.testCRC32Update | 131072 | thrpt | 40 | 2.841 | 0.001 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 262144 | thrpt | 40 | 1.420 | 0.001 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 524288 | thrpt | 40 | 0.709 | 0.001 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 2097152 | thrpt | 40 | 0.176 | 0.001 | ops/ms |
Disabled intrinsic
| Benchmark | (count) | Mode | Cnt | Score | Error | Units |
| ------------------------------- | ----------- | ----------- | ----- | ------- | ------ | ---------- |
| CRC32.TestCRC32.testCRC32Update | 131072 | thrpt | 40 | 2.729 | 0.003 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 262144 | thrpt | 40 | 1.367 | 0.001 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 524288 | thrpt | 40 | 0.684 | 0.001 | ops/ms |
| CRC32.TestCRC32.testCRC32Update | 2097152 | thrpt | 40 | 0.170 | 0.001 | ops/ms |
| (count) | enabled/disabled |
| ------------------ | ------------------ |
| 131072 | 1,041040674 |
| 262144 | 1,038771031 |
| 524288 | 1.036549708 |
| 2097152 | 1,035294118 |
So since there are no regressions compared to C2-generated code with `-XX:+UseZba`, how about making the intrinsic Zba-exclusive?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-2032833530
More information about the hotspot-compiler-dev
mailing list