RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5]

Thu Aug 22 15:56:09 UTC 2024

On Thu, 22 Aug 2024 12:23:04 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:

> > One thing that's odd, but not really wrong. Why do you process byte arrays 32-wide instead of 16-wide like everything else? It makes the code more complex than doing everything 8-wide ...
> 
> There's no arrangement specifier for `LD1 (multiple structures)` which instructs to load 4 single byte sized elements per a SIMD&FP register.

Isn't that `ld1 V1.s, V2.s, V3.s, v4.s, [x1]`?

> > ... and doesn't seem to increase performance, either with my measurements or yours.
> 
> What measurements are you referring to here? 

Your performance figures, and mine, as quoted in this PR.

It's really not important, though.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2305108265