RFR: 8322770: Implement C2 VectorizedHashCode on AArch64
Andrew Haley
aph at openjdk.org
Mon Apr 22 15:51:28 UTC 2024
On Mon, 22 Apr 2024 14:45:49 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:
> > A high-performance AArch64 implementation can issue four multiply-accumulate vector instructions per cycle, with a 3-clock latency.
>
> @theRealAph , hmph, could you elaborate on what spec you refer to here?
That's not so much a spec, more Dougall's measured Apple M1 performance: https://dougallj.github.io/applecpu/measurements/firestorm/UMLAL_v_4S.html.
Other high-end AArch64 designs can't do that, but they won't suffer by going wider. We should be able to sustain pipelined 4 int-wide elements/cycle.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2069971112
More information about the hotspot-dev
mailing list