RFR: 8302113: Improve CRC32 intrinsic with crypto pmull on AArch64

Yi-Fan Tsai duke at openjdk.org
Fri Feb 10 21:20:44 UTC 2023


Instruction pmull and pmull2 support operating on 64-bit data in Cryptographic Extension. The execution throughput of this form raises from 1 on Neoverse N1 to 4 on Neoverse V1 while the latency remains 2. The CRC32 instructions did not changed: latency 2, throughput 1. As a result, computing CRC32 using pmull could perform better than using crc32 instruction.

The following test has passed.
test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java

The throughput reported by the micro benchmark is measured on an EC2 c7g instance. The optimization shows 11 - 99% improvement when the input is at least 384 bytes.

| input               | 64         | 128        | 256        | 384        | 511        | 512        | 1,024      |
| ------------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| CRC32  improvement  | 0.02%      | 0.02%      | 0.00%      | 16.00%     | 11.94%     | 34.75%     | 69.80%     |

| input               | 2,048      | 4,096      | 8,192      | 16,384     | 32,768     | 65,536     |
| ------------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| CRC32  improvement  | 77.61%     | 92.33%     | 95.98%     | 97.95%     | 99.33%     | 98.36%     |


Baseline

TestCRC32.testCRC32Update         64  thrpt   12  173126.358 ± 118.330  ops/ms
TestCRC32.testCRC32Update        128  thrpt   12  112910.118 ±  47.305  ops/ms
TestCRC32.testCRC32Update        256  thrpt   12   66601.990 ±   7.294  ops/ms
TestCRC32.testCRC32Update        384  thrpt   12   47229.319 ±   3.949  ops/ms
TestCRC32.testCRC32Update        511  thrpt   12   33733.119 ±   4.076  ops/ms
TestCRC32.testCRC32Update        512  thrpt   12   36584.565 ±   4.211  ops/ms
TestCRC32.testCRC32Update       1024  thrpt   12   19239.083 ±   1.040  ops/ms
TestCRC32.testCRC32Update       2048  thrpt   12    9875.652 ±   0.435  ops/ms
TestCRC32.testCRC32Update       4096  thrpt   12    5004.425 ±   0.290  ops/ms
TestCRC32.testCRC32Update       8192  thrpt   12    2519.185 ±   0.169  ops/ms
TestCRC32.testCRC32Update      16384  thrpt   12    1263.909 ±   0.194  ops/ms
TestCRC32.testCRC32Update      32768  thrpt   12     632.018 ±   0.053  ops/ms
TestCRC32.testCRC32Update      65536  thrpt   12     315.471 ±   0.095  ops/ms


Crypto pmull

TestCRC32.testCRC32Update         64  thrpt   12  173168.669 ±   4.746  ops/ms
TestCRC32.testCRC32Update        128  thrpt   12  112933.519 ±   4.583  ops/ms
TestCRC32.testCRC32Update        256  thrpt   12   66602.462 ±   3.150  ops/ms
TestCRC32.testCRC32Update        384  thrpt   12   54784.739 ±   2.110  ops/ms
TestCRC32.testCRC32Update        511  thrpt   12   37760.816 ±  69.911  ops/ms
TestCRC32.testCRC32Update        512  thrpt   12   49297.609 ±  21.983  ops/ms
TestCRC32.testCRC32Update       1024  thrpt   12   32667.507 ±  90.610  ops/ms
TestCRC32.testCRC32Update       2048  thrpt   12   17539.986 ± 511.416  ops/ms
TestCRC32.testCRC32Update       4096  thrpt   12    9625.249 ±   9.713  ops/ms
TestCRC32.testCRC32Update       8192  thrpt   12    4937.135 ±   6.121  ops/ms
TestCRC32.testCRC32Update      16384  thrpt   12    2501.936 ±   1.270  ops/ms
TestCRC32.testCRC32Update      32768  thrpt   12    1259.831 ±   0.119  ops/ms
TestCRC32.testCRC32Update      65536  thrpt   12     625.773 ±   0.242  ops/ms

-------------

Commit messages:
 - Remove CRC32-C
 - Support CRC32-C
 - Merge master
 - Add microbenchmark TestCRC32
 - Change code alignment
 - Separate code paths
 - Enable on Neoverse V1
 - Disable the optimization by default
 - Reduce data dependency before load
 - PMULL

Changes: https://git.openjdk.org/jdk/pull/12480/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12480&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8302113
  Stats: 214 lines in 5 files changed: 213 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/12480.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12480/head:pull/12480

PR: https://git.openjdk.org/jdk/pull/12480


More information about the hotspot-dev mailing list