RFR: 8280510: AArch64: Vectorize operations with loop induction variable [v3]

Tobias Hartmann thartmann at openjdk.java.net
Mon Apr 25 07:05:33 UTC 2022


On Wed, 20 Apr 2022 05:01:41 GMT, Pengfei Li <pli at openjdk.org> wrote:

>> AArch64 has SVE instruction of populating incrementing indices into an
>> SVE vector register. With this we can vectorize some operations in loop
>> with the induction variable operand, such as below.
>> 
>>   for (int i = 0; i < count; i++) {
>>     b[i] = a[i] * i;
>>   }
>> 
>> This patch enables the vectorization of operations with loop induction
>> variable by extending current scope of C2 superword vectorizable packs.
>> Before this patch, any scalar input node in a vectorizable pack must be
>> an out-of-loop invariant. This patch takes the induction variable input
>> as consideration. It allows the input to be the iv phi node or phi plus
>> its index offset, and creates a `PopulateIndexNode` to generate a vector
>> filled with incrementing indices. On AArch64 SVE, final generated code
>> for above loop expression is like below.
>> 
>>   add     x12, x16, x10
>>   add     x12, x12, #0x10
>>   ld1w    {z16.s}, p7/z, [x12]
>>   index   z17.s, w1, #1
>>   mul     z17.s, p7/m, z17.s, z16.s
>>   add     x10, x17, x10
>>   add     x10, x10, #0x10
>>   st1w    {z17.s}, p7, [x10]
>> 
>> As there is no populating index instruction on AArch64 NEON or other
>> platforms like x86, a function named `is_populate_index_supported()` is
>> created in the VectorNode class for the backend support check.
>> 
>> Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
>> are tested and no issue is found. Hotspot jtreg has existing tests in
>> `compiler/c2/cr7192963/Test*Vect.java` covering this kind of use cases so
>> no new jtreg is created within this patch. A new JMH is created in this
>> patch and tested on a 512-bit SVE machine. Below test result shows the
>> performance can be significantly improved in some cases.
>> 
>>   Benchmark                       Performance
>>   IndexVector.exprWithIndex1            ~7.7x
>>   IndexVector.exprWithIndex2           ~13.3x
>>   IndexVector.indexArrayFill            ~5.7x
>
> Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
> 
>  - Merge branch 'master' into indexvector
>  - Fix cut-and-paste error
>  - Merge branch 'master' into indexvector
>  - 8280510: AArch64: Vectorize operations with loop induction variable
>    
>    AArch64 has SVE instruction of populating incrementing indices into an
>    SVE vector register. With this we can vectorize some operations in loop
>    with the induction variable operand, such as below.
>    
>      for (int i = 0; i < count; i++) {
>        b[i] = a[i] * i;
>      }
>    
>    This patch enables the vectorization of operations with loop induction
>    variable by extending current scope of C2 superword vectorizable packs.
>    Before this patch, any scalar input node in a vectorizable pack must be
>    an out-of-loop invariant. This patch takes the induction variable input
>    as consideration. It allows the input to be the iv phi node or phi plus
>    its index offset, and creates a PopulateIndexNode to generate a vector
>    filled with incrementing indices. On AArch64 SVE, final generated code
>    for above loop expression is like below.
>    
>      add     x12, x16, x10
>      add     x12, x12, #0x10
>      ld1w    {z16.s}, p7/z, [x12]
>      index   z17.s, w1, #1
>      mul     z17.s, p7/m, z17.s, z16.s
>      add     x10, x17, x10
>      add     x10, x10, #0x10
>      st1w    {z17.s}, p7, [x10]
>    
>    As there is no populating index instruction on AArch64 NEON or other
>    platforms like x86, a function named is_populate_index_supported() is
>    created in the VectorNode class for the backend support check.
>    
>    Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
>    are tested and no issue is found. Hotspot jtreg has existing tests in
>    compiler/c2/cr7192963/Test*Vect.java covering this kind of use cases so
>    no new jtreg is created within this patch. A new JMH is created in this
>    patch and tested on a 512-bit SVE machine. Below test result shows the
>    performance can be significantly improved in some cases.
>    
>      Benchmark                       Performance
>      IndexVector.exprWithIndex1            ~7.7x
>      IndexVector.exprWithIndex2           ~13.3x
>      IndexVector.indexArrayFill            ~5.7x

I executed some testing, all passed.

Do we need to add the new node to `MatchRule::is_expensive()`?

Do we need to add a declaration to `vmStructs.cpp`?

-------------

PR: https://git.openjdk.java.net/jdk/pull/7491


More information about the hotspot-compiler-dev mailing list