RFR: 8359256: AArch64: Use SHA3 GPR intrinsic where it's faster
Evgeny Astigeevich
eastigeevich at openjdk.org
Mon Oct 27 17:26:03 UTC 2025
On Thu, 9 Oct 2025 13:26:51 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:
> This change adjusts the default selection of SHA-3 intrinsics on AArch64 based on observed performance across CPUs. In our measurements, the SHA-3 SIMD path (using SHA3 instructions) is consistently faster on Apple silicon, while on Neoverse and several older cores the GPR implementation performs better. On CPUs without SHA-3 instructions, the GPR path is the only viable option and behaves as expected.
>
> Accordingly, `UseSIMDForSHA3Intrinsic` now defaults to false globally. The SIMD variant is auto-enabled only on Apple silicon; elsewhere the default remains the GPR path.
>
> _The attached raw data also includes observations about `UseFPUForSpilling`. Back in #27350 we discussed whether the option is entirely useless. While orthogonal to this change, the MessageDigests benchmark is a convenient probe of register-spilling behavior because the SHA-3 (Keccak) algorithm is highly register-hungry, which adds a significant number of spills to the generated assembly sequence. In the provided results, at least one CPU benefits from enabling UseFPUForSpilling, so the option seems worth keeping for now._
>
> **Cortex-A53 (RPi3)**
>
> $ ./jdk-25/bin/java -jar benchmarks.jar -p digesterName=SHA3-512 -jvmArgs "-XX:-UseFPUForSpilling -XX:+UnlockDiagnosticVMOptions -XX:-UseSHA3Intrinsics -XX:TieredStopAtLevel=4" MessageDigests.digest
> Benchmark (digesterName) (length) Cnt Score Error Units
> MessageDigests.digest SHA3-512 64 150 345.010 ± 0.473 ops/ms
> MessageDigests.digest SHA3-512 16384 150 1.817 ± 0.001 ops/ms
>
> $ ./jdk-25/bin/java -jar benchmarks.jar -p digesterName=SHA3-512 -jvmArgs "-XX:+UseFPUForSpilling -XX:+UnlockDiagnosticVMOptions -XX:-UseSHA3Intrinsics -XX:TieredStopAtLevel=4" MessageDigests.digest
> MessageDigests.digest SHA3-512 64 150 352.247 ± 0.279 ops/ms +UseFPUForSpilling: +2%
> MessageDigests.digest SHA3-512 16384 150 1.855 ± 0.001 ops/ms +UseFPUForSpilling: +2%
>
> $ ./jdk-25/bin/java -jar benchmarks.jar MessageDigests -p digesterName=SHA3-512 -jvmArgs "-XX:+UnlockDiagnosticVMOptions -XX:-UseSHA3Intrinsics" 2>&1 | tail -n5
> Benchmark (digesterName) (length) Cnt Score Error Units
> MessageDigests.digest SHA3-512 64 15 345.552 ± 0.291 ops/ms
> MessageDigests.digest SHA3-512 16384 15 1.818 ± 0.001 ops/ms
> MessageDigests.getAndDigest SHA3-512 64 15 265.744 ± 56.591 ops/ms
> MessageDigests.getAndDigest SHA3-512 16384 1...
src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 397:
> 395: if (!VM_Version::supports_sha3() && UseSIMDForSHA3Intrinsic) {
> 396: warning("Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU.");
> 397: FLAG_SET_DEFAULT(UseSIMDForSHA3Intrinsic, false);
The warning needs to be update to reflect that SHA3 instructions are not available. The warning says "Intrinsics are not available". This is not true because we can use the SHA3 GPR intrinsic.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/27726#discussion_r2466480130
More information about the hotspot-dev
mailing list