(8u) RFR: 8131062: aarch64: add support for GHASH acceleration

Hohensee, Paul hohensee at amazon.com
Tue Aug 17 23:50:38 UTC 2021


Lgtm.

Thanks,
Paul

-----Original Message-----
From: jdk8u-dev <jdk8u-dev-retn at openjdk.java.net> on behalf of "Liu, Xin" <xxinliu at amazon.com>
Date: Monday, August 16, 2021 at 11:00 AM
To: "jdk8u-dev at openjdk.java.net" <jdk8u-dev at openjdk.java.net>
Subject: (8u) RFR: 8131062: aarch64: add support for GHASH acceleration

Hi,

I'd like to request a review of 8131062 for jdk8u.  This patch can
accelerate AES/GCM 4~5 times on armv8 by leveraging NEON isntructions.
It can't apply to jdkj8u/hotspot cleanly, but it's trivial to integrate
it. I just adjust code locations.


Bug:https://bugs.openjdk.java.net/browse/JDK-8131062
webrev: https://cr.openjdk.java.net/~xliu/8131062/webrev/


Background:
GHASH.update is a step in AES/GCM, which authenticates encrypted and
additional authenticated data(AAD). The cost of authentication may
dominate the AES/GCM cryptographic procedure because:
    1. HotSpot has utilized the dedicated crypto instruction to
accelerate AES.
    2. Applications feed long AAD. In particular, GMAC is the
degeneration form of AES/GCM, which has empty encrypted data.

In one application, we have seen 85% time of AES/GCM
kernel(com.sun.crypto.provider.GaloisCounterMode.doLastBlock) spends in
GHASH.update on ARMv8 .


Test:
Passed all tiers tests.  In particular, I ran test group jdk_security2
with JIT compulsory options. This test group cover AES/GCM of SUNJCE.

-Xcomp -ea -esa -XX:CompileThreshold=100 -XX:+TieredCompilation
-Xcomp -ea -esa -XX:CompileThreshold=100 -XX:-TieredCompilation

Performance-wise, I compare openjdk's microbench. Both AESGCMBench and
AESGCMByteBuffer show 4~5x speedups. No regression is identified.

Benchmark		(dataSize)	(keyLength)	(provider)   Mode	Cnt	Score		Error	Units	
AESGCMBench.decrypt		1024	128	thrpt	40	239163.89	±	1160.304	ops/s		4.84
AESGCMBench.decrypt		1500	128	thrpt	40	161903.415	±	2401.411	ops/s		4.70
AESGCMBench.decrypt		4096	128	thrpt	40	65811.465	±	201.468	ops/s		5.08
AESGCMBench.decrypt		16384	128	thrpt	40	17352.055	±	159.804	ops/s		5.26
AESGCMBench.encrypt		1024	128	thrpt	40	256015.225	±	1672.668	ops/s		5.02
AESGCMBench.encrypt		1500	128	thrpt	40	178494.592	±	1039.942	ops/s		5.08
AESGCMBench.encrypt		4096	128	thrpt	40	69738.953	±	734.39	ops/s		5.26
AESGCMBench.encrypt		16384	128	thrpt	40	18074.523	±	123.812	ops/s		5.42

MICRO="VM_OPTIONS=-XX:+UnlockDiagnosticVMOptions
-XX:-UseGHASHIntrinsics"									
Benchmark	(dataMethod)	(dataSize)	(keyLength)	(provider)   Mode	Cnt
Score		Error	Units
AESGCMBench.decrypt		1024	128	thrpt	40	49461.421	±	69.308	ops/s
AESGCMBench.decrypt		1500	128	thrpt	40	34428.813	±	72.038	ops/s
AESGCMBench.decrypt		4096	128	thrpt	40	12963.692	±	17.79	ops/s
AESGCMBench.decrypt		16384	128	thrpt	40	3300.073	±	6.289	ops/s
AESGCMBench.encrypt		1024	128	thrpt	40	50966.635	±	35.366	ops/s
AESGCMBench.encrypt		1500	128	thrpt	40	35140.397	±	60.898	ops/s
AESGCMBench.encrypt		4096	128	thrpt	40	13249.987	±	12.158	ops/s
AESGCMBench.encrypt		16384	128	thrpt	40	3337.749	±	8.104	ops/s


Call out:
I notice that jdk8u with the patch is 25%~30% faster than jdk-tip on
aarch64. eg.

--jdk8u-with-ghash_patch (UseG1GC)
Benchmark            (dataSize)  (keyLength)  (provider)   Mode  Cnt
  Score      Error  Units
AESGCMBench.decrypt        1024          128              thrpt   40
237136.403 ± 1466.344  ops/s
AESGCMBench.encrypt        1024          128              thrpt   40
255047.697 ± 1112.361  ops/s
--JDK-Tip
AESGCMBench.decrypt        1024          128              thrpt   40
179547.255 ± 911.374  ops/s
AESGCMBench.encrypt        1024          128              thrpt   40
177891.031 ± 463.519  ops/s

The bottleneck is 'com.sun.crypto.provider.GCTR::doFinal' caused by
JDK-8177784. Aleksey also noticed that. I think this gap can be filled
by JDK-8271567 in tip.

thanks
--lx






More information about the jdk8u-dev mailing list