RFR: 8302113: Improve CRC32 intrinsic with crypto pmull on AArch64 [v2]
Volker Simonis
simonis at openjdk.org
Wed Feb 15 09:56:48 UTC 2023
On Mon, 13 Feb 2023 17:14:05 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:
>> Instruction pmull and pmull2 support operating on 64-bit data in Cryptographic Extension. The execution throughput of this form raises from 1 on Neoverse N1 to 4 on Neoverse V1 while the latency remains 2. The CRC32 instructions did not changed: latency 2, throughput 1. As a result, computing CRC32 using pmull could perform better than using crc32 instruction.
>>
>> The following test has passed.
>> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java
>>
>> The throughput reported by [the micro benchmark](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/TestCRC32.java) is measured on an EC2 c7g instance. The optimization shows 11 - 99% improvement when the input is at least 384 bytes.
>>
>> | input | 64 | 128 | 256 | 384 | 511 | 512 | 1,024 |
>> | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
>> | improvement | 0.02% | 0.02% | 0.00% | 16.00% | 11.94% | 34.75% | 69.80% |
>>
>> | input | 2,048 | 4,096 | 8,192 | 16,384 | 32,768 | 65,536 |
>> | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
>> | improvement | 77.61% | 92.33% | 95.98% | 97.95% | 99.33% | 98.36% |
>>
>>
>> Baseline
>>
>> TestCRC32.testCRC32Update 64 thrpt 12 173126.358 ± 118.330 ops/ms
>> TestCRC32.testCRC32Update 128 thrpt 12 112910.118 ± 47.305 ops/ms
>> TestCRC32.testCRC32Update 256 thrpt 12 66601.990 ± 7.294 ops/ms
>> TestCRC32.testCRC32Update 384 thrpt 12 47229.319 ± 3.949 ops/ms
>> TestCRC32.testCRC32Update 511 thrpt 12 33733.119 ± 4.076 ops/ms
>> TestCRC32.testCRC32Update 512 thrpt 12 36584.565 ± 4.211 ops/ms
>> TestCRC32.testCRC32Update 1024 thrpt 12 19239.083 ± 1.040 ops/ms
>> TestCRC32.testCRC32Update 2048 thrpt 12 9875.652 ± 0.435 ops/ms
>> TestCRC32.testCRC32Update 4096 thrpt 12 5004.425 ± 0.290 ops/ms
>> TestCRC32.testCRC32Update 8192 thrpt 12 2519.185 ± 0.169 ops/ms
>> TestCRC32.testCRC32Update 16384 thrpt 12 1263.909 ± 0.194 ops/ms
>> TestCRC32.testCRC32Update 32768 thrpt 12 632.018 ± 0.053 ops/ms
>> TestCRC32.testCRC32Update 65536 thrpt 12 315.471 ± 0.095 ops/ms
>>
>>
>> Crypto pmull
>>
>> TestCRC32.testCRC32Update 64 thrpt 12 173168.669 ± 4.746 ops/ms
>> TestCRC32.testCRC32Update 128 thrpt 12 112933.519 ± 4.583 ops/ms
>> TestCRC32.testCRC32Update 256 thrpt 12 66602.462 ± 3.150 ops/ms
>> TestCRC32.testCRC32Update 384 thrpt 12 54784.739 ± 2.110 ops/ms
>> TestCRC32.testCRC32Update 511 thrpt 12 37760.816 ± 69.911 ops/ms
>> TestCRC32.testCRC32Update 512 thrpt 12 49297.609 ± 21.983 ops/ms
>> TestCRC32.testCRC32Update 1024 thrpt 12 32667.507 ± 90.610 ops/ms
>> TestCRC32.testCRC32Update 2048 thrpt 12 17539.986 ± 511.416 ops/ms
>> TestCRC32.testCRC32Update 4096 thrpt 12 9625.249 ± 9.713 ops/ms
>> TestCRC32.testCRC32Update 8192 thrpt 12 4937.135 ± 6.121 ops/ms
>> TestCRC32.testCRC32Update 16384 thrpt 12 2501.936 ± 1.270 ops/ms
>> TestCRC32.testCRC32Update 32768 thrpt 12 1259.831 ± 0.119 ops/ms
>> TestCRC32.testCRC32Update 65536 thrpt 12 625.773 ± 0.242 ops/ms
>
> Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision:
>
> Change to UseCryptoPmullForCRC32
In general your changes look good. I only have a few minor questions/comments below.
Also, can you please update the copyright year in the files you've touched?
Thank you and best regards,
Volker
src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3777:
> 3775:
> 3776: if (UseCRC32) {
> 3777: if (UseCryptoPmullForCRC32) {
Whis is this inside `UseCRC32`? It means that if we set `-XX:-UseCRC32 -XX:+UseCryptoPmullForCRC32` on the command line, we won't get `UseCryptoPmullForCRC32`
src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 257:
> 255: }
> 256:
> 257: if (UseCryptoPmullForCRC32 && (!VM_Version::supports_sha3() || !VM_Version::supports_pmull())) {
Why does `UseCryptoPmullForCRC32` depend on `VM_Version::supports_sha3()`?
Does `VM_Version::supports_pmull()` really check for the `pmull` in the cryptographic extension?
-------------
PR: https://git.openjdk.org/jdk/pull/12480
More information about the hotspot-dev
mailing list