RFR: 8337666: AArch64: SHA3 GPR intrinsic [v4]
Andrew Haley
aph at openjdk.org
Sat May 31 08:43:54 UTC 2025
On Fri, 30 May 2025 21:21:08 GMT, Dmitry Chuyko <dchuyko at openjdk.org> wrote:
>> This is an implementation of SHA3 intrinsics for AArch64 that operates GPRs. It follows the Java implementation algorithm but eagerly uses available registers. For example, FP+R18 are used when it's allowed. On simpler cores like RPi3 or Surface Pro it is 23-53% faster than C2 compiled version; on Graviton 3 it is 8-14% faster than C2 compiled version (which is faster than the current intrinsic); on Apple Silicon it is faster than C2 compiled version but slower than the ARMv8.2-SHA intrinsic. Improvements on a particular CPU depend on the input length. For instance, for Graviton 2:
>>
>>
>> Benchmark (ops/ms) (digesterName) (length) G2
>> MessageDigests.digest SHA3-256 64 28.28%
>> MessageDigests.digest SHA3-256 16384 53.58%
>> MessageDigests.digest SHA3-512 64 27.97%
>> MessageDigests.digest SHA3-512 16384 43.90%
>> MessageDigests.getAndDigest SHA3-256 64 26.18%
>> MessageDigests.getAndDigest SHA3-256 16384 52.82%
>> MessageDigests.getAndDigest SHA3-512 64 24.73%
>> MessageDigests.getAndDigest SHA3-512 16384 44.31%
>>
>>
>> (results for intermediate input lengths look like steps)
>>
>> On Graviton 4 there is still a noticeable difference between the proposed implementation and C2 generated code:
>>
>>
>> Benchmark (digesterName) (length) Pct
>> MessageDigests.digest SHA3-256 64 8.3%
>> MessageDigests.digest SHA3-256 16384 11%
>> MessageDigests.digest SHA3-512 64 8.4%
>> MessageDigests.digest SHA3-512 16384 11.5%
>> MessageDigests.getAndDigest SHA3-256 64 7.2%
>> MessageDigests.getAndDigest SHA3-256 16384 11%
>> MessageDigests.getAndDigest SHA3-512 64 7.3%
>> MessageDigests.getAndDigest SHA3-512 16384 11.6%
>>
>>
>> and the version that uses the extension is ~1.8x slower than C2
>>
>> Existing intrinsic implementation is put under a flag `UseSIMDForSHA3Intrinsic` which is on by default where the intrinsic is enabled currently.
>>
>> Sanity tests were modified to cover new intrinsic variants (`-XX:-UseSIMDForSHA3Intrinsic -XX:+-PreserveFramePointer`) on aarch64 hw. Existing test cases where intrinsic is enabled are executed with `-XX:+IgnoreUnrecognizedVMOptions -XX:+UseSIMDForSHA3Intrinsic`, on platforms where the sha3 extension is missing they still are cut off by isSHA3IntrinsicAvailable() predicate.
>>
>> The original PR https://github.com/openjdk/jdk/pull/20422 has been auto-closed and the branch has...
>
> Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
>
> - Merge branch 'openjdk:master' into JDK-8337666
> - Assert message
> - Copyright year
> - Review suggestions
> - Merge master
> - Delete empty line
> - SHA3 GPR intrinsic & tests
src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 331:
> 329:
> 330: inline void rol(Register Rd, Register Rn, unsigned imm) {
> 331: extr(Rd, Rn, Rn, ((64 - imm) & 63));
Suggestion:
extr(Rd, Rn, Rn, (64 - imm));
It's better to catch an out-of-range immediate value.
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7412:
> 7410: __ ldr(state, Address(sp, 112));
> 7411: }
> 7412: // saving calculated sha3 state
Suggestion:
// save calculated sha3 state
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/24260#discussion_r2117560706
PR Review Comment: https://git.openjdk.org/jdk/pull/24260#discussion_r2117562638
More information about the hotspot-dev
mailing list