Integrated: 8359256: AArch64: Use SHA3 GPR intrinsic where it's faster
Boris Ulasevich
bulasevich at openjdk.org
Thu Nov 6 12:59:26 UTC 2025
On Thu, 9 Oct 2025 13:26:51 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:
> This change adjusts the default selection of SHA-3 intrinsics on AArch64 based on observed performance across CPUs. In our measurements, the SHA-3 SIMD path (using SHA3 instructions) is consistently faster on Apple silicon, while on Neoverse and several older cores the GPR implementation performs better. On CPUs without SHA-3 instructions, the GPR path is the only viable option and behaves as expected.
>
> Accordingly, `UseSIMDForSHA3Intrinsic` now defaults to false globally. The SIMD variant is auto-enabled only on Apple silicon; elsewhere the default remains the GPR path.
>
> _The attached raw data also includes observations about `UseFPUForSpilling`. Back in #27350 we discussed whether the option is entirely useless. While orthogonal to this change, the MessageDigests benchmark is a convenient probe of register-spilling behavior because the SHA-3 (Keccak) algorithm is highly register-hungry, which adds a significant number of spills to the generated assembly sequence. In the provided results, at least one CPU benefits from enabling UseFPUForSpilling, so the option seems worth keeping for now._
>
> **Cortex-A53 (RPi3)**
>
> $ ./jdk-25/bin/java -jar benchmarks.jar -p digesterName=SHA3-512 -jvmArgs "-XX:-UseFPUForSpilling -XX:+UnlockDiagnosticVMOptions -XX:-UseSHA3Intrinsics -XX:TieredStopAtLevel=4" MessageDigests.digest
> Benchmark (digesterName) (length) Cnt Score Error Units
> MessageDigests.digest SHA3-512 64 150 345.010 ± 0.473 ops/ms
> MessageDigests.digest SHA3-512 16384 150 1.817 ± 0.001 ops/ms
>
> $ ./jdk-25/bin/java -jar benchmarks.jar -p digesterName=SHA3-512 -jvmArgs "-XX:+UseFPUForSpilling -XX:+UnlockDiagnosticVMOptions -XX:-UseSHA3Intrinsics -XX:TieredStopAtLevel=4" MessageDigests.digest
> MessageDigests.digest SHA3-512 64 150 352.247 ± 0.279 ops/ms +UseFPUForSpilling: +2%
> MessageDigests.digest SHA3-512 16384 150 1.855 ± 0.001 ops/ms +UseFPUForSpilling: +2%
>
> $ ./jdk-25/bin/java -jar benchmarks.jar MessageDigests -p digesterName=SHA3-512 -jvmArgs "-XX:+UnlockDiagnosticVMOptions -XX:-UseSHA3Intrinsics" 2>&1 | tail -n5
> Benchmark (digesterName) (length) Cnt Score Error Units
> MessageDigests.digest SHA3-512 64 15 345.552 ± 0.291 ops/ms
> MessageDigests.digest SHA3-512 16384 15 1.818 ± 0.001 ops/ms
> MessageDigests.getAndDigest SHA3-512 64 15 265.744 ± 56.591 ops/ms
> MessageDigests.getAndDigest SHA3-512 16384 1...
This pull request has now been integrated.
Changeset: c173d416
Author: Boris Ulasevich <bulasevich at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/c173d416f749348bee42e1a9436a999700d0f0e8
Stats: 19 lines in 2 files changed: 6 ins; 0 del; 13 mod
8359256: AArch64: Use SHA3 GPR intrinsic where it's faster
Reviewed-by: eastigeevich, phh
-------------
PR: https://git.openjdk.org/jdk/pull/27726
More information about the hotspot-dev
mailing list