RFR: 8359256: AArch64: Use SHA3 GPR intrinsic where it's faster

Evgeny Astigeevich eastigeevich at openjdk.org
Mon Oct 27 17:26:03 UTC 2025


On Thu, 9 Oct 2025 13:26:51 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

> This change adjusts the default selection of SHA-3 intrinsics on AArch64 based on observed performance across CPUs. In our measurements, the SHA-3 SIMD path (using SHA3 instructions) is consistently faster on Apple silicon, while on Neoverse and several older cores the GPR implementation performs better. On CPUs without SHA-3 instructions, the GPR path is the only viable option and behaves as expected.
> 
> Accordingly, `UseSIMDForSHA3Intrinsic` now defaults to false globally. The SIMD variant is auto-enabled only on Apple silicon; elsewhere the default remains the GPR path.
> 
> _The attached raw data also includes observations about `UseFPUForSpilling`. Back in #27350 we discussed whether the option is entirely useless. While orthogonal to this change, the MessageDigests benchmark is a convenient probe of register-spilling behavior because the SHA-3 (Keccak) algorithm is highly register-hungry, which adds a significant number of spills to the generated assembly sequence. In the provided results, at least one CPU benefits from enabling UseFPUForSpilling, so the option seems worth keeping for now._
> 
> **Cortex-A53 (RPi3)**
> 
> $ ./jdk-25/bin/java -jar benchmarks.jar -p digesterName=SHA3-512 -jvmArgs "-XX:-UseFPUForSpilling -XX:+UnlockDiagnosticVMOptions -XX:-UseSHA3Intrinsics -XX:TieredStopAtLevel=4" MessageDigests.digest
> Benchmark          (digesterName)  (length)   Cnt    Score   Error   Units
> MessageDigests.digest    SHA3-512        64   150  345.010 ± 0.473  ops/ms
> MessageDigests.digest    SHA3-512     16384   150    1.817 ± 0.001  ops/ms
> 
> $ ./jdk-25/bin/java -jar benchmarks.jar -p digesterName=SHA3-512 -jvmArgs "-XX:+UseFPUForSpilling -XX:+UnlockDiagnosticVMOptions -XX:-UseSHA3Intrinsics -XX:TieredStopAtLevel=4" MessageDigests.digest
> MessageDigests.digest    SHA3-512        64   150  352.247 ± 0.279  ops/ms  +UseFPUForSpilling: +2%
> MessageDigests.digest    SHA3-512     16384   150    1.855 ± 0.001  ops/ms  +UseFPUForSpilling: +2%
> 
> $ ./jdk-25/bin/java -jar benchmarks.jar MessageDigests -p digesterName=SHA3-512 -jvmArgs "-XX:+UnlockDiagnosticVMOptions -XX:-UseSHA3Intrinsics" 2>&1 | tail -n5
> Benchmark                (digesterName)  (length)   Cnt    Score    Error   Units
> MessageDigests.digest          SHA3-512        64    15  345.552 ±  0.291  ops/ms
> MessageDigests.digest          SHA3-512     16384    15    1.818 ±  0.001  ops/ms
> MessageDigests.getAndDigest    SHA3-512        64    15  265.744 ± 56.591  ops/ms
> MessageDigests.getAndDigest    SHA3-512     16384    1...

src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 397:

> 395:   if (!VM_Version::supports_sha3() && UseSIMDForSHA3Intrinsic) {
> 396:     warning("Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU.");
> 397:     FLAG_SET_DEFAULT(UseSIMDForSHA3Intrinsic, false);

The warning needs to be update to reflect that SHA3 instructions are not available. The warning says "Intrinsics are not available". This is not true because we can use the SHA3 GPR intrinsic.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27726#discussion_r2466480130


More information about the hotspot-dev mailing list