RFR: 8280510: AArch64: Vectorize operations with loop induction variable
Andrew Dinn
adinn at openjdk.java.net
Tue Mar 15 09:56:44 UTC 2022
On Wed, 16 Feb 2022 08:26:14 GMT, Pengfei Li <pli at openjdk.org> wrote:
> AArch64 has SVE instruction of populating incrementing indices into an
> SVE vector register. With this we can vectorize some operations in loop
> with the induction variable operand, such as below.
>
> for (int i = 0; i < count; i++) {
> b[i] = a[i] * i;
> }
>
> This patch enables the vectorization of operations with loop induction
> variable by extending current scope of C2 superword vectorizable packs.
> Before this patch, any scalar input node in a vectorizable pack must be
> an out-of-loop invariant. This patch takes the induction variable input
> as consideration. It allows the input to be the iv phi node or phi plus
> its index offset, and creates a `PopulateIndexNode` to generate a vector
> filled with incrementing indices. On AArch64 SVE, final generated code
> for above loop expression is like below.
>
> add x12, x16, x10
> add x12, x12, #0x10
> ld1w {z16.s}, p7/z, [x12]
> index z17.s, w1, #1
> mul z17.s, p7/m, z17.s, z16.s
> add x10, x17, x10
> add x10, x10, #0x10
> st1w {z17.s}, p7, [x10]
>
> As there is no populating index instruction on AArch64 NEON or other
> platforms like x86, a function named `is_populate_index_supported()` is
> created in the VectorNode class for the backend support check.
>
> Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
> are tested and no issue is found. Hotspot jtreg has existing tests in
> `compiler/c2/cr7192963/Test*Vect.java` covering this kind of use cases so
> no new jtreg is created within this patch. A new JMH is created in this
> patch and tested on a 512-bit SVE machine. Below test result shows the
> performance can be significantly improved in some cases.
>
> Benchmark Performance
> IndexVector.exprWithIndex1 ~7.7x
> IndexVector.exprWithIndex2 ~13.3x
> IndexVector.indexArrayFill ~5.7x
The code looks ok to me modulo the cut-and-paste error I highlighted.
Have you any way of ensuring that the generated code is correct (other than the fact that it runs quicker).
Also have you checked this does not invalidate any other SVE tests. I would expect some failures thanks to the cut-and-paste error.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7491
More information about the hotspot-compiler-dev
mailing list