RFR: 8337666: AArch64: SHA3 GPR intrinsic

Dmitry Chuyko dchuyko at openjdk.org
Wed Mar 26 16:02:32 UTC 2025


This is an implementation of SHA3 intrinsics for AArch64 that operates GPRs. It follows the Java implementation algorithm but eagerly uses available registers. For example, FP+R18 are used when it's allowed. On simpler cores like RPi3 or Surface Pro it is 23-53% faster than C2 compiled version; on Graviton 3 it is 8-14% faster than C2 compiled version (which is faster than the current intrinsic); on Apple Silicon it is faster than C2 compiled version but slower than the ARMv8.2-SHA intrinsic. Improvements on a particular CPU depend on the input length. For instance, for Graviton 2:


Benchmark (ops/ms)	(digesterName)	(length)	G2
MessageDigests.digest	SHA3-256	64	28.28%
MessageDigests.digest	SHA3-256	16384	53.58%
MessageDigests.digest	SHA3-512	64	27.97%
MessageDigests.digest	SHA3-512	16384	43.90%
MessageDigests.getAndDigest	SHA3-256	64	26.18%
MessageDigests.getAndDigest	SHA3-256	16384	52.82%
MessageDigests.getAndDigest	SHA3-512	64	24.73%
MessageDigests.getAndDigest	SHA3-512	16384	44.31%


(results for intermediate input lengths look like steps)

On Graviton 4 there is still a noticeable difference between the proposed implementation and C2 generated code:


Benchmark                    (digesterName)  (length)  Pct
MessageDigests.digest              SHA3-256        64     8.3%
MessageDigests.digest              SHA3-256     16384     11%
MessageDigests.digest              SHA3-512        64     8.4%
MessageDigests.digest              SHA3-512     16384     11.5%
MessageDigests.getAndDigest        SHA3-256        64     7.2%
MessageDigests.getAndDigest        SHA3-256     16384     11%
MessageDigests.getAndDigest        SHA3-512        64     7.3%
MessageDigests.getAndDigest        SHA3-512     16384     11.6%


and the version that uses the extension is ~1.8x slower than C2

Existing intrinsic implementation is put under a flag `UseSIMDForSHA3Intrinsic` which is on by default where the intrinsic is enabled currently.

Sanity tests were modified to cover new intrinsic variants (`-XX:-UseSIMDForSHA3Intrinsic -XX:+-PreserveFramePointer`) on aarch64 hw. Existing test cases where intrinsic is enabled are executed with `-XX:+IgnoreUnrecognizedVMOptions -XX:+UseSIMDForSHA3Intrinsic`, on platforms where the sha3 extension is missing they still are cut off by isSHA3IntrinsicAvailable() predicate.

The original PR https://github.com/openjdk/jdk/pull/20422 has been auto-closed and the branch has been re-created on top of the new master.

-------------

Commit messages:
 - Delete empty line
 - SHA3 GPR intrinsic & tests

Changes: https://git.openjdk.org/jdk/pull/24260/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24260&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337666
  Stats: 757 lines in 5 files changed: 752 ins; 1 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/24260.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24260/head:pull/24260

PR: https://git.openjdk.org/jdk/pull/24260


More information about the hotspot-dev mailing list