RFR: 8302113: Improve CRC32 intrinsic with crypto pmull on AArch64 [v3]

Volker Simonis simonis at openjdk.org
Thu Feb 16 12:37:30 UTC 2023


On Thu, 16 Feb 2023 06:13:03 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:

>> Instruction pmull and pmull2 support operating on 64-bit data in Cryptographic Extension. The execution throughput of this form raises from 1 on Neoverse N1 to 4 on Neoverse V1 while the latency remains 2. The CRC32 instructions did not changed: latency 2, throughput 1. As a result, computing CRC32 using pmull could perform better than using crc32 instruction.
>> 
>> The following test has passed.
>> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java
>> 
>> The throughput reported by [the micro benchmark](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/TestCRC32.java) is measured on an EC2 c7g instance. The optimization shows 11 - 99% improvement when the input is at least 384 bytes.
>> 
>> | input               | 64         | 128        | 256        | 384        | 511        | 512        | 1,024      |
>> | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
>> |  improvement  | 0.02%      | 0.02%      | 0.00%      | 16.00%     | 11.94%     | 34.75%     | 69.80%     |
>> 
>> | input               | 2,048      | 4,096      | 8,192      | 16,384     | 32,768     | 65,536     |
>> | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
>> |  improvement  | 77.61%     | 92.33%     | 95.98%     | 97.95%     | 99.33%     | 98.36%     |
>> 
>> 
>> Baseline
>> 
>> TestCRC32.testCRC32Update         64  thrpt   12  173126.358 ± 118.330  ops/ms
>> TestCRC32.testCRC32Update        128  thrpt   12  112910.118 ±  47.305  ops/ms
>> TestCRC32.testCRC32Update        256  thrpt   12   66601.990 ±   7.294  ops/ms
>> TestCRC32.testCRC32Update        384  thrpt   12   47229.319 ±   3.949  ops/ms
>> TestCRC32.testCRC32Update        511  thrpt   12   33733.119 ±   4.076  ops/ms
>> TestCRC32.testCRC32Update        512  thrpt   12   36584.565 ±   4.211  ops/ms
>> TestCRC32.testCRC32Update       1024  thrpt   12   19239.083 ±   1.040  ops/ms
>> TestCRC32.testCRC32Update       2048  thrpt   12    9875.652 ±   0.435  ops/ms
>> TestCRC32.testCRC32Update       4096  thrpt   12    5004.425 ±   0.290  ops/ms
>> TestCRC32.testCRC32Update       8192  thrpt   12    2519.185 ±   0.169  ops/ms
>> TestCRC32.testCRC32Update      16384  thrpt   12    1263.909 ±   0.194  ops/ms
>> TestCRC32.testCRC32Update      32768  thrpt   12     632.018 ±   0.053  ops/ms
>> TestCRC32.testCRC32Update      65536  thrpt   12     315.471 ±   0.095  ops/ms
>> 
>> 
>> Crypto pmull
>> 
>> TestCRC32.testCRC32Update         64  thrpt   12  173168.669 ±   4.746  ops/ms
>> TestCRC32.testCRC32Update        128  thrpt   12  112933.519 ±   4.583  ops/ms
>> TestCRC32.testCRC32Update        256  thrpt   12   66602.462 ±   3.150  ops/ms
>> TestCRC32.testCRC32Update        384  thrpt   12   54784.739 ±   2.110  ops/ms
>> TestCRC32.testCRC32Update        511  thrpt   12   37760.816 ±  69.911  ops/ms
>> TestCRC32.testCRC32Update        512  thrpt   12   49297.609 ±  21.983  ops/ms
>> TestCRC32.testCRC32Update       1024  thrpt   12   32667.507 ±  90.610  ops/ms
>> TestCRC32.testCRC32Update       2048  thrpt   12   17539.986 ± 511.416  ops/ms
>> TestCRC32.testCRC32Update       4096  thrpt   12    9625.249 ±   9.713  ops/ms
>> TestCRC32.testCRC32Update       8192  thrpt   12    4937.135 ±   6.121  ops/ms
>> TestCRC32.testCRC32Update      16384  thrpt   12    2501.936 ±   1.270  ops/ms
>> TestCRC32.testCRC32Update      32768  thrpt   12    1259.831 ±   0.119  ops/ms
>> TestCRC32.testCRC32Update      65536  thrpt   12     625.773 ±   0.242  ops/ms
>
> Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Make UseCryptoPmullForCRC32 independent of UseCRC32

Looks good now. Thanks for the explanation and for decoupling `UseCryptoPmullForCRC32` and `UseCRC32`.

-------------

Marked as reviewed by simonis (Reviewer).

PR: https://git.openjdk.org/jdk/pull/12480


More information about the hotspot-dev mailing list