RFR: 8322770: Implement C2 VectorizedHashCode on AArch64

Thu Aug 22 11:08:06 UTC 2024

On Fri, 5 Jul 2024 17:23:04 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:

> * [x]  For arrays shorter than the number of elements processed by a single iteration of the Neon loop performance is not optimal, though still better than the baseline's.

Previously I noticed that unrolling the scalar loop by 8 instead of 4 might result in better performance for shorter arrays/strings. After running benchmarks on Neoverse N1, Neoverse V1 and Neoverse V2 I can say that the results are not consistent across the range of CPUs. While increasing the unroll factor does slightly improve the performance on Neoverse N1, the same doesn't hold true for Neoverse V1/V2. Thus I think it doesn't worth the increased code size.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2304391438