(8u) RFR: 8131062: aarch64: add support for GHASH acceleration
Hohensee, Paul
hohensee at amazon.com
Tue Aug 17 23:50:38 UTC 2021
Lgtm.
Thanks,
Paul
-----Original Message-----
From: jdk8u-dev <jdk8u-dev-retn at openjdk.java.net> on behalf of "Liu, Xin" <xxinliu at amazon.com>
Date: Monday, August 16, 2021 at 11:00 AM
To: "jdk8u-dev at openjdk.java.net" <jdk8u-dev at openjdk.java.net>
Subject: (8u) RFR: 8131062: aarch64: add support for GHASH acceleration
Hi,
I'd like to request a review of 8131062 for jdk8u. This patch can
accelerate AES/GCM 4~5 times on armv8 by leveraging NEON isntructions.
It can't apply to jdkj8u/hotspot cleanly, but it's trivial to integrate
it. I just adjust code locations.
Bug:https://bugs.openjdk.java.net/browse/JDK-8131062
webrev: https://cr.openjdk.java.net/~xliu/8131062/webrev/
Background:
GHASH.update is a step in AES/GCM, which authenticates encrypted and
additional authenticated data(AAD). The cost of authentication may
dominate the AES/GCM cryptographic procedure because:
1. HotSpot has utilized the dedicated crypto instruction to
accelerate AES.
2. Applications feed long AAD. In particular, GMAC is the
degeneration form of AES/GCM, which has empty encrypted data.
In one application, we have seen 85% time of AES/GCM
kernel(com.sun.crypto.provider.GaloisCounterMode.doLastBlock) spends in
GHASH.update on ARMv8 .
Test:
Passed all tiers tests. In particular, I ran test group jdk_security2
with JIT compulsory options. This test group cover AES/GCM of SUNJCE.
-Xcomp -ea -esa -XX:CompileThreshold=100 -XX:+TieredCompilation
-Xcomp -ea -esa -XX:CompileThreshold=100 -XX:-TieredCompilation
Performance-wise, I compare openjdk's microbench. Both AESGCMBench and
AESGCMByteBuffer show 4~5x speedups. No regression is identified.
Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
AESGCMBench.decrypt 1024 128 thrpt 40 239163.89 ± 1160.304 ops/s 4.84
AESGCMBench.decrypt 1500 128 thrpt 40 161903.415 ± 2401.411 ops/s 4.70
AESGCMBench.decrypt 4096 128 thrpt 40 65811.465 ± 201.468 ops/s 5.08
AESGCMBench.decrypt 16384 128 thrpt 40 17352.055 ± 159.804 ops/s 5.26
AESGCMBench.encrypt 1024 128 thrpt 40 256015.225 ± 1672.668 ops/s 5.02
AESGCMBench.encrypt 1500 128 thrpt 40 178494.592 ± 1039.942 ops/s 5.08
AESGCMBench.encrypt 4096 128 thrpt 40 69738.953 ± 734.39 ops/s 5.26
AESGCMBench.encrypt 16384 128 thrpt 40 18074.523 ± 123.812 ops/s 5.42
MICRO="VM_OPTIONS=-XX:+UnlockDiagnosticVMOptions
-XX:-UseGHASHIntrinsics"
Benchmark (dataMethod) (dataSize) (keyLength) (provider) Mode Cnt
Score Error Units
AESGCMBench.decrypt 1024 128 thrpt 40 49461.421 ± 69.308 ops/s
AESGCMBench.decrypt 1500 128 thrpt 40 34428.813 ± 72.038 ops/s
AESGCMBench.decrypt 4096 128 thrpt 40 12963.692 ± 17.79 ops/s
AESGCMBench.decrypt 16384 128 thrpt 40 3300.073 ± 6.289 ops/s
AESGCMBench.encrypt 1024 128 thrpt 40 50966.635 ± 35.366 ops/s
AESGCMBench.encrypt 1500 128 thrpt 40 35140.397 ± 60.898 ops/s
AESGCMBench.encrypt 4096 128 thrpt 40 13249.987 ± 12.158 ops/s
AESGCMBench.encrypt 16384 128 thrpt 40 3337.749 ± 8.104 ops/s
Call out:
I notice that jdk8u with the patch is 25%~30% faster than jdk-tip on
aarch64. eg.
--jdk8u-with-ghash_patch (UseG1GC)
Benchmark (dataSize) (keyLength) (provider) Mode Cnt
Score Error Units
AESGCMBench.decrypt 1024 128 thrpt 40
237136.403 ± 1466.344 ops/s
AESGCMBench.encrypt 1024 128 thrpt 40
255047.697 ± 1112.361 ops/s
--JDK-Tip
AESGCMBench.decrypt 1024 128 thrpt 40
179547.255 ± 911.374 ops/s
AESGCMBench.encrypt 1024 128 thrpt 40
177891.031 ± 463.519 ops/s
The bottleneck is 'com.sun.crypto.provider.GCTR::doFinal' caused by
JDK-8177784. Aleksey also noticed that. I think this gap can be filled
by JDK-8271567 in tip.
thanks
--lx
More information about the jdk8u-dev
mailing list