RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors

Thu Mar 6 09:17:53 UTC 2025

On Wed, 5 Mar 2025 10:03:31 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Hi @XiaohongGong , thanks for testing this variation. I also expected it to have relatively better performance due to the absence of the load instruction. Maybe it might help in larger real-world workload where reducing some load instructions or having fewer instructions can help performance (by reducing pressure on icache/iTLB).
>>  Thinking of aarch64 Neon machines that we can test this on - we have only N1, V2 (Grace) machines which have support for 128-bit Neon. V1 is 256 bit Neon/SVE which will execute the `sve tbl` instruction instead. I can of course disable SVE and run the Neon instructions on V1 but I don't think that would really make any difference. So for 128-bit Neon machines, I can also test only on N1 and V2 which you've already done. Do you have a specific machine in mind that you'd like this to be tested on?
>
> Thanks for your clarify @Bhavana-Kilambi . I agree with you that it may not make any difference on other machines. So do you suggest that I change the pattern right now, or revisit this part once we met the performance issue on other real-world workload?

Sure, I am fine with going ahead with the current implementation and revisit if we encounter any performance issues. Thanks for testing.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1982979067