RFR: 8280510: AArch64: Vectorize operations with loop induction variable [v2]
Pengfei Li
pli at openjdk.java.net
Wed Mar 16 01:51:20 UTC 2022
> AArch64 has SVE instruction of populating incrementing indices into an
> SVE vector register. With this we can vectorize some operations in loop
> with the induction variable operand, such as below.
>
> for (int i = 0; i < count; i++) {
> b[i] = a[i] * i;
> }
>
> This patch enables the vectorization of operations with loop induction
> variable by extending current scope of C2 superword vectorizable packs.
> Before this patch, any scalar input node in a vectorizable pack must be
> an out-of-loop invariant. This patch takes the induction variable input
> as consideration. It allows the input to be the iv phi node or phi plus
> its index offset, and creates a `PopulateIndexNode` to generate a vector
> filled with incrementing indices. On AArch64 SVE, final generated code
> for above loop expression is like below.
>
> add x12, x16, x10
> add x12, x12, #0x10
> ld1w {z16.s}, p7/z, [x12]
> index z17.s, w1, #1
> mul z17.s, p7/m, z17.s, z16.s
> add x10, x17, x10
> add x10, x10, #0x10
> st1w {z17.s}, p7, [x10]
>
> As there is no populating index instruction on AArch64 NEON or other
> platforms like x86, a function named `is_populate_index_supported()` is
> created in the VectorNode class for the backend support check.
>
> Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
> are tested and no issue is found. Hotspot jtreg has existing tests in
> `compiler/c2/cr7192963/Test*Vect.java` covering this kind of use cases so
> no new jtreg is created within this patch. A new JMH is created in this
> patch and tested on a 512-bit SVE machine. Below test result shows the
> performance can be significantly improved in some cases.
>
> Benchmark Performance
> IndexVector.exprWithIndex1 ~7.7x
> IndexVector.exprWithIndex2 ~13.3x
> IndexVector.indexArrayFill ~5.7x
Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
- Fix cut-and-paste error
- Merge branch 'master' into indexvector
- 8280510: AArch64: Vectorize operations with loop induction variable
AArch64 has SVE instruction of populating incrementing indices into an
SVE vector register. With this we can vectorize some operations in loop
with the induction variable operand, such as below.
for (int i = 0; i < count; i++) {
b[i] = a[i] * i;
}
This patch enables the vectorization of operations with loop induction
variable by extending current scope of C2 superword vectorizable packs.
Before this patch, any scalar input node in a vectorizable pack must be
an out-of-loop invariant. This patch takes the induction variable input
as consideration. It allows the input to be the iv phi node or phi plus
its index offset, and creates a PopulateIndexNode to generate a vector
filled with incrementing indices. On AArch64 SVE, final generated code
for above loop expression is like below.
add x12, x16, x10
add x12, x12, #0x10
ld1w {z16.s}, p7/z, [x12]
index z17.s, w1, #1
mul z17.s, p7/m, z17.s, z16.s
add x10, x17, x10
add x10, x10, #0x10
st1w {z17.s}, p7, [x10]
As there is no populating index instruction on AArch64 NEON or other
platforms like x86, a function named is_populate_index_supported() is
created in the VectorNode class for the backend support check.
Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
are tested and no issue is found. Hotspot jtreg has existing tests in
compiler/c2/cr7192963/Test*Vect.java covering this kind of use cases so
no new jtreg is created within this patch. A new JMH is created in this
patch and tested on a 512-bit SVE machine. Below test result shows the
performance can be significantly improved in some cases.
Benchmark Performance
IndexVector.exprWithIndex1 ~7.7x
IndexVector.exprWithIndex2 ~13.3x
IndexVector.indexArrayFill ~5.7x
-------------
Changes:
- all: https://git.openjdk.java.net/jdk/pull/7491/files
- new: https://git.openjdk.java.net/jdk/pull/7491/files/e85e8ef4..2fdb3e0c
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7491&range=01
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7491&range=00-01
Stats: 42778 lines in 1090 files changed: 28920 ins; 7727 del; 6131 mod
Patch: https://git.openjdk.java.net/jdk/pull/7491.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/7491/head:pull/7491
PR: https://git.openjdk.java.net/jdk/pull/7491
More information about the hotspot-compiler-dev
mailing list