RFR: 8337666: AArch64: SHA3 GPR intrinsic

Thu Apr 24 20:26:53 UTC 2025

On Thu, 24 Apr 2025 17:05:59 GMT, Andrew Haley <aph at openjdk.org> wrote:

> > > So, this is for two now-discontinued computers? Does this patch improve performance on any recently-available hardware, or is it purely for retrocomputers?
> > 
> > 
> > I'd not call Graviton 4,3,2 retro.
> 
> I'm trying to understand what you wrote, which is very confusing. Reading it again, on Graviton 3 it is 8-14% faster than the existing fastest implementation.

Correct. And for newest Graviton 4 there was hope to see either no difference between this version and C2 generated code or to see 8252204 being faster than C2 generated code. However, on Graviton 4 this version is still 7-12% faster than C2 generated code, which is still faster than 8252204.

> 
> I don't think we should maintain multiple implementations of SHA-3 unless there is a convincing advantage one way or the other. I certainly don't want to see a precedent where we have special versions for crypto algorithms for each microarchitecture. Is 8252204 much faster that this one on Apple silicon? It would be great if we could ditch that one.

Even on M1 8252204 is 28-32% faster than this one. They seem to have 4 execution blocks per core for the accelerator instructions (unlike servers that may provide just 1 unit).

It would be great if C2 could allocate scratch registers in such methods but that would complicate the entire port.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24260#issuecomment-2828770547