Withdrawn: 8337666: AArch64: SHA3 GPR intrinsic

duke duke at openjdk.org
Fri Oct 25 17:30:14 UTC 2024


On Thu, 1 Aug 2024 14:38:12 GMT, Dmitry Chuyko <dchuyko at openjdk.org> wrote:

> This is an implementation of SHA3 intrinsics for AArch64 that operates GPRs. It follows the Java implementation algorithm but eagerly uses available registers. For example, FP+R18 are used when it's allowed. On simpler cores like RPi3 or Surface Pro it is 23-53% faster than C2 compiled version; on Graviton 3 it is 8-14% faster than C2 compiled version (which is faster than the current intrinsic); on Apple Silicon it is faster than C2 compiled version but slower than the ARMv8.2-SHA intrinsic. Improvements on a particular CPU depend on the input length. For instance, for Graviton 2:
> 
> 
> Benchmark (ops/ms)	(digesterName)	(length)	G2
> MessageDigests.digest	SHA3-256	64	28.28%
> MessageDigests.digest	SHA3-256	16384	53.58%
> MessageDigests.digest	SHA3-512	64	27.97%
> MessageDigests.digest	SHA3-512	16384	43.90%
> MessageDigests.getAndDigest	SHA3-256	64	26.18%
> MessageDigests.getAndDigest	SHA3-256	16384	52.82%
> MessageDigests.getAndDigest	SHA3-512	64	24.73%
> MessageDigests.getAndDigest	SHA3-512	16384	44.31%
> 
> 
> (results for intermediate input lengths look like steps)
> 
> Existing intrinsic implementation is put under a flag `UseSIMDForSHA3Intrinsic` which is on by default where the intrinsic is enabled currently.
> 
> Sanity tests were modified to cover new intrinsic variants (`-XX:-UseSIMDForSHA3Intrinsic -XX:+-PreserveFramePointer`) on aarch64 hw. Existing test cases where intrinsic is enabled are executed with `-XX:+IgnoreUnrecognizedVMOptions -XX:+UseSIMDForSHA3Intrinsic`, on platforms where the sha3 extension is missing they still are cut off by isSHA3IntrinsicAvailable() predicate.

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/20422


More information about the hotspot-dev mailing list