RFR: 8302783: Improve CRC32C intrinsic with crypto pmull on AArch64
Paul Hohensee
phh at openjdk.org
Thu Mar 2 22:26:11 UTC 2023
On Fri, 17 Feb 2023 19:59:24 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:
> This change adds a pmull-based CRC32C intrinsic, and it is more performant than the existing crc32c-instruction-based intrinsic on Neoverse V1. The benchmark shows 10 - 99% improvement. The improvement comes from the execution throughput increase of pmull/pmull2 from 1 on Neoverse N1 to 4 on Neoverse V1 while the latency remains 2 while the throughput of CRC32C instructions did not changed.
>
> The pmull-based CRC32C intrinsic is enabled by the existing option UseCryptoPmullForCRC32 which also enables the pmull-based CRC32 intrinsic. The option requires crc32c instructions, eor3 in SHA3, and 64-bit pmull/pmull2 in Cryptographic Extension.
>
> With this change, there will be only two different CRC32C intrinsics, crc32c and pmull, while there are four CRC32 intrinsics.
>
> The following test has passed.
> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java
>
> The throughput reported by [the micro benchmark](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/TestCRC32C.java) is measured on an EC2 c7g instance. The optimization shows 10 - 99% improvement when the input is at least 384 bytes.
>
> | input | 64 | 128 | 256 | 384 | 511 | 512 | 1,024 |
> | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
> | improvement | 1.60% | 0.00% | 0.00% | 15.24% | 10.76% | 34.32% | 72.39% |
>
> | input | 2,048 | 4,096 | 8,192 | 16,384 | 32,768 | 65,536 |
> | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
> | improvement | 84.96% | 92.59% | 96.19% | 98.02% | 99.32% | 98.36% |
>
>
> Baseline
>
> Benchmark (count) Mode Cnt Score Error Units
> TestCRC32C.testCRC32CUpdate 64 thrpt 12 196575.739 ± 1824.113 ops/ms
> TestCRC32C.testCRC32CUpdate 128 thrpt 12 123666.570 ± 2.730 ops/ms
> TestCRC32C.testCRC32CUpdate 256 thrpt 12 70188.989 ± 2.002 ops/ms
> TestCRC32C.testCRC32CUpdate 384 thrpt 12 49000.690 ± 1.421 ops/ms
> TestCRC32C.testCRC32CUpdate 511 thrpt 12 34106.279 ± 25.390 ops/ms
> TestCRC32C.testCRC32CUpdate 512 thrpt 12 37638.349 ± 1.039 ops/ms
> TestCRC32C.testCRC32CUpdate 1024 thrpt 12 19526.513 ± 0.439 ops/ms
> TestCRC32C.testCRC32CUpdate 2048 thrpt 12 9951.392 ± 4.803 ops/ms
> TestCRC32C.testCRC32CUpdate 4096 thrpt 12 5023.268 ± 0.240 ops/ms
> TestCRC32C.testCRC32CUpdate 8192 thrpt 12 2523.877 ± 0.062 ops/ms
> TestCRC32C.testCRC32CUpdate 16384 thrpt 12 1265.011 ± 0.047 ops/ms
> TestCRC32C.testCRC32CUpdate 32768 thrpt 12 632.291 ± 0.058 ops/ms
> TestCRC32C.testCRC32CUpdate 65536 thrpt 12 315.396 ± 0.160 ops/ms
>
>
> Crypto pmull
>
> Benchmark (count) Mode Cnt Score Error Units
> TestCRC32C.testCRC32CUpdate 64 thrpt 12 199726.599 ± 166.477 ops/ms
> TestCRC32C.testCRC32CUpdate 128 thrpt 12 123669.385 ± 1.821 ops/ms
> TestCRC32C.testCRC32CUpdate 256 thrpt 12 70188.727 ± 1.313 ops/ms
> TestCRC32C.testCRC32CUpdate 384 thrpt 12 56468.837 ± 76.524 ops/ms
> TestCRC32C.testCRC32CUpdate 511 thrpt 12 37777.205 ± 406.431 ops/ms
> TestCRC32C.testCRC32CUpdate 512 thrpt 12 50554.555 ± 17.169 ops/ms
> TestCRC32C.testCRC32CUpdate 1024 thrpt 12 33661.006 ± 140.471 ops/ms
> TestCRC32C.testCRC32CUpdate 2048 thrpt 12 18406.482 ± 205.952 ops/ms
> TestCRC32C.testCRC32CUpdate 4096 thrpt 12 9674.159 ± 20.390 ops/ms
> TestCRC32C.testCRC32CUpdate 8192 thrpt 12 4951.562 ± 6.566 ops/ms
> TestCRC32C.testCRC32CUpdate 16384 thrpt 12 2504.970 ± 1.883 ops/ms
> TestCRC32C.testCRC32CUpdate 32768 thrpt 12 1260.278 ± 0.484 ops/ms
> TestCRC32C.testCRC32CUpdate 65536 thrpt 12 625.608 ± 0.300 ops/ms
Lgtm.
The linux-x86 pre-submit test failure is caused by a test using -XX:+UseCompressedClassPointers, which is an invalid switch for 32-bit JVMs.
The linux-cross-compile pre-submit test failure is a compile-time failure in src/hotspot/cpu/arm/interpreterRT_arm.cpp, which latter is not touched by this patch.
-------------
Marked as reviewed by phh (Reviewer).
PR: https://git.openjdk.org/jdk/pull/12624
More information about the hotspot-dev
mailing list