RFR: 8280510: AArch64: Vectorize operations with loop induction variable [v2]
Pengfei Li
pli at openjdk.java.net
Thu Apr 28 14:08:48 UTC 2022
On Tue, 19 Apr 2022 08:45:05 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:
>> Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>>
>> - Fix cut-and-paste error
>> - Merge branch 'master' into indexvector
>> - 8280510: AArch64: Vectorize operations with loop induction variable
>>
>> AArch64 has SVE instruction of populating incrementing indices into an
>> SVE vector register. With this we can vectorize some operations in loop
>> with the induction variable operand, such as below.
>>
>> for (int i = 0; i < count; i++) {
>> b[i] = a[i] * i;
>> }
>>
>> This patch enables the vectorization of operations with loop induction
>> variable by extending current scope of C2 superword vectorizable packs.
>> Before this patch, any scalar input node in a vectorizable pack must be
>> an out-of-loop invariant. This patch takes the induction variable input
>> as consideration. It allows the input to be the iv phi node or phi plus
>> its index offset, and creates a PopulateIndexNode to generate a vector
>> filled with incrementing indices. On AArch64 SVE, final generated code
>> for above loop expression is like below.
>>
>> add x12, x16, x10
>> add x12, x12, #0x10
>> ld1w {z16.s}, p7/z, [x12]
>> index z17.s, w1, #1
>> mul z17.s, p7/m, z17.s, z16.s
>> add x10, x17, x10
>> add x10, x10, #0x10
>> st1w {z17.s}, p7, [x10]
>>
>> As there is no populating index instruction on AArch64 NEON or other
>> platforms like x86, a function named is_populate_index_supported() is
>> created in the VectorNode class for the backend support check.
>>
>> Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
>> are tested and no issue is found. Hotspot jtreg has existing tests in
>> compiler/c2/cr7192963/Test*Vect.java covering this kind of use cases so
>> no new jtreg is created within this patch. A new JMH is created in this
>> patch and tested on a 512-bit SVE machine. Below test result shows the
>> performance can be significantly improved in some cases.
>>
>> Benchmark Performance
>> IndexVector.exprWithIndex1 ~7.7x
>> IndexVector.exprWithIndex2 ~13.3x
>> IndexVector.indexArrayFill ~5.7x
>
> Please resolve the merge conflicts.
Hi @TobiHartmann ,
> Looks good to me too. Have you considered adding a IR verification test?
Thanks for your review. And good question!
You might have remembered that we contributed a vectorization test framework together with our post loop fix (https://github.com/openjdk/jdk/pull/6828). Currently we are enabling IR tests in that framework. So far we have made some progress. So IR tests will be added for all vectorizable loops there (in `hotspot:compiler/vectorization/runner/`) in the near future.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7491
More information about the hotspot-compiler-dev
mailing list