RFR: 8322770: Implement C2 VectorizedHashCode on AArch64

Mon Apr 22 15:51:28 UTC 2024

On Mon, 22 Apr 2024 14:45:49 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:

> > A high-performance AArch64 implementation can issue four multiply-accumulate vector instructions per cycle, with a 3-clock latency.
> 
> @theRealAph , hmph, could you elaborate on what spec you refer to here?

That's not so much a spec, more Dougall's measured Apple M1 performance: https://dougallj.github.io/applecpu/measurements/firestorm/UMLAL_v_4S.html.
Other high-end AArch64 designs can't do that, but they won't suffer by going wider. We should be able to sustain pipelined 4 int-wide elements/cycle.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2069971112