Integrated: 8337666: AArch64: SHA3 GPR intrinsic
    Dmitry Chuyko 
    dchuyko at openjdk.org
       
    Thu Jun  5 14:31:00 UTC 2025
    
    
  
On Wed, 26 Mar 2025 15:55:59 GMT, Dmitry Chuyko <dchuyko at openjdk.org> wrote:
> This is an implementation of SHA3 intrinsics for AArch64 that operates GPRs. It follows the Java implementation algorithm but eagerly uses available registers. For example, FP+R18 are used when it's allowed. On simpler cores like RPi3 or Surface Pro it is 23-53% faster than C2 compiled version; on Graviton 3 it is 8-14% faster than C2 compiled version (which is faster than the current intrinsic); on Apple Silicon it is faster than C2 compiled version but slower than the ARMv8.2-SHA intrinsic. Improvements on a particular CPU depend on the input length. For instance, for Graviton 2:
> 
> 
> G2
> Benchmark                    (digesterName)  (length)	Pct
> MessageDigests.digest              SHA3-256        64     28.28%
> MessageDigests.digest              SHA3-256     16384     53.58%
> MessageDigests.digest              SHA3-512        64     27.97%
> MessageDigests.digest              SHA3-512     16384     43.90%
> MessageDigests.getAndDigest        SHA3-256        64     26.18%
> MessageDigests.getAndDigest        SHA3-256     16384     52.82%
> MessageDigests.getAndDigest        SHA3-512        64     24.73%
> MessageDigests.getAndDigest        SHA3-512     16384     44.31%
> 
> 
> (results for intermediate input lengths look like steps)
> 
> On Graviton 4 there is still a noticeable difference between the proposed implementation and C2 generated code:
> 
> 
> G4
> Benchmark                    (digesterName)  (length)	Pct
> MessageDigests.digest              SHA3-256        64     8.3%
> MessageDigests.digest              SHA3-256     16384     11%
> MessageDigests.digest              SHA3-512        64     8.4%
> MessageDigests.digest              SHA3-512     16384     11.5%
> MessageDigests.getAndDigest        SHA3-256        64     7.2%
> MessageDigests.getAndDigest        SHA3-256     16384     11%
> MessageDigests.getAndDigest        SHA3-512        64     7.3%
> MessageDigests.getAndDigest        SHA3-512     16384     11.6%
> 
> 
> and the version that uses the extension is ~1.8x slower than C2
> 
> Existing intrinsic implementation is put under a flag `UseSIMDForSHA3Intrinsic` which is on by default where the intrinsic is enabled currently.
> 
> Sanity tests were modified to cover new intrinsic variants (`-XX:-UseSIMDForSHA3Intrinsic -XX:+-PreserveFramePointer`) on aarch64 hw. Existing test cases where intrinsic is enabled are executed with `-XX:+IgnoreUnrecognizedVMOptions -XX:+UseSIMDForSHA3Intrinsic`, on platforms where the sha3 extension is missing they still are cut off by isSHA3IntrinsicAvailable() predicate....
This pull request has now been integrated.
Changeset: 23f1d4f9
Author:    Dmitry Chuyko <dchuyko at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/23f1d4f9a993033596ff17751c877f2bb3f792ed
Stats:     749 lines in 6 files changed: 743 ins; 0 del; 6 mod
8337666: AArch64: SHA3 GPR intrinsic
Reviewed-by: aph
-------------
PR: https://git.openjdk.org/jdk/pull/24260
    
    
More information about the hotspot-dev
mailing list