RFR: 8302113: Improve CRC32 intrinsic with crypto pmull on AArch64 [v2]

Volker Simonis simonis at openjdk.org
Wed Feb 15 09:56:48 UTC 2023


On Mon, 13 Feb 2023 17:14:05 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:

>> Instruction pmull and pmull2 support operating on 64-bit data in Cryptographic Extension. The execution throughput of this form raises from 1 on Neoverse N1 to 4 on Neoverse V1 while the latency remains 2. The CRC32 instructions did not changed: latency 2, throughput 1. As a result, computing CRC32 using pmull could perform better than using crc32 instruction.
>> 
>> The following test has passed.
>> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java
>> 
>> The throughput reported by [the micro benchmark](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/TestCRC32.java) is measured on an EC2 c7g instance. The optimization shows 11 - 99% improvement when the input is at least 384 bytes.
>> 
>> | input               | 64         | 128        | 256        | 384        | 511        | 512        | 1,024      |
>> | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
>> |  improvement  | 0.02%      | 0.02%      | 0.00%      | 16.00%     | 11.94%     | 34.75%     | 69.80%     |
>> 
>> | input               | 2,048      | 4,096      | 8,192      | 16,384     | 32,768     | 65,536     |
>> | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
>> |  improvement  | 77.61%     | 92.33%     | 95.98%     | 97.95%     | 99.33%     | 98.36%     |
>> 
>> 
>> Baseline
>> 
>> TestCRC32.testCRC32Update         64  thrpt   12  173126.358 ± 118.330  ops/ms
>> TestCRC32.testCRC32Update        128  thrpt   12  112910.118 ±  47.305  ops/ms
>> TestCRC32.testCRC32Update        256  thrpt   12   66601.990 ±   7.294  ops/ms
>> TestCRC32.testCRC32Update        384  thrpt   12   47229.319 ±   3.949  ops/ms
>> TestCRC32.testCRC32Update        511  thrpt   12   33733.119 ±   4.076  ops/ms
>> TestCRC32.testCRC32Update        512  thrpt   12   36584.565 ±   4.211  ops/ms
>> TestCRC32.testCRC32Update       1024  thrpt   12   19239.083 ±   1.040  ops/ms
>> TestCRC32.testCRC32Update       2048  thrpt   12    9875.652 ±   0.435  ops/ms
>> TestCRC32.testCRC32Update       4096  thrpt   12    5004.425 ±   0.290  ops/ms
>> TestCRC32.testCRC32Update       8192  thrpt   12    2519.185 ±   0.169  ops/ms
>> TestCRC32.testCRC32Update      16384  thrpt   12    1263.909 ±   0.194  ops/ms
>> TestCRC32.testCRC32Update      32768  thrpt   12     632.018 ±   0.053  ops/ms
>> TestCRC32.testCRC32Update      65536  thrpt   12     315.471 ±   0.095  ops/ms
>> 
>> 
>> Crypto pmull
>> 
>> TestCRC32.testCRC32Update         64  thrpt   12  173168.669 ±   4.746  ops/ms
>> TestCRC32.testCRC32Update        128  thrpt   12  112933.519 ±   4.583  ops/ms
>> TestCRC32.testCRC32Update        256  thrpt   12   66602.462 ±   3.150  ops/ms
>> TestCRC32.testCRC32Update        384  thrpt   12   54784.739 ±   2.110  ops/ms
>> TestCRC32.testCRC32Update        511  thrpt   12   37760.816 ±  69.911  ops/ms
>> TestCRC32.testCRC32Update        512  thrpt   12   49297.609 ±  21.983  ops/ms
>> TestCRC32.testCRC32Update       1024  thrpt   12   32667.507 ±  90.610  ops/ms
>> TestCRC32.testCRC32Update       2048  thrpt   12   17539.986 ± 511.416  ops/ms
>> TestCRC32.testCRC32Update       4096  thrpt   12    9625.249 ±   9.713  ops/ms
>> TestCRC32.testCRC32Update       8192  thrpt   12    4937.135 ±   6.121  ops/ms
>> TestCRC32.testCRC32Update      16384  thrpt   12    2501.936 ±   1.270  ops/ms
>> TestCRC32.testCRC32Update      32768  thrpt   12    1259.831 ±   0.119  ops/ms
>> TestCRC32.testCRC32Update      65536  thrpt   12     625.773 ±   0.242  ops/ms
>
> Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Change to UseCryptoPmullForCRC32

In general your changes look good. I only have a few minor questions/comments below.

Also, can you please update the copyright year in the files you've touched?

Thank you and best regards,
Volker

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3777:

> 3775: 
> 3776:   if (UseCRC32) {
> 3777:       if (UseCryptoPmullForCRC32) {

Whis is this inside `UseCRC32`? It means that if we set `-XX:-UseCRC32 -XX:+UseCryptoPmullForCRC32` on the command line, we won't get `UseCryptoPmullForCRC32`

src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 257:

> 255:   }
> 256: 
> 257:   if (UseCryptoPmullForCRC32 && (!VM_Version::supports_sha3() || !VM_Version::supports_pmull())) {

Why does `UseCryptoPmullForCRC32` depend on `VM_Version::supports_sha3()`?

Does `VM_Version::supports_pmull()` really check for the `pmull` in the cryptographic extension?

-------------

PR: https://git.openjdk.org/jdk/pull/12480


More information about the hotspot-dev mailing list