RFR: 8280510: AArch64: Vectorize operations with loop induction variable [v2]

Pengfei Li pli at openjdk.java.net
Tue Apr 26 04:57:56 UTC 2022


On Tue, 19 Apr 2022 08:45:05 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>> Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Fix cut-and-paste error
>>  - Merge branch 'master' into indexvector
>>  - 8280510: AArch64: Vectorize operations with loop induction variable
>>    
>>    AArch64 has SVE instruction of populating incrementing indices into an
>>    SVE vector register. With this we can vectorize some operations in loop
>>    with the induction variable operand, such as below.
>>    
>>      for (int i = 0; i < count; i++) {
>>        b[i] = a[i] * i;
>>      }
>>    
>>    This patch enables the vectorization of operations with loop induction
>>    variable by extending current scope of C2 superword vectorizable packs.
>>    Before this patch, any scalar input node in a vectorizable pack must be
>>    an out-of-loop invariant. This patch takes the induction variable input
>>    as consideration. It allows the input to be the iv phi node or phi plus
>>    its index offset, and creates a PopulateIndexNode to generate a vector
>>    filled with incrementing indices. On AArch64 SVE, final generated code
>>    for above loop expression is like below.
>>    
>>      add     x12, x16, x10
>>      add     x12, x12, #0x10
>>      ld1w    {z16.s}, p7/z, [x12]
>>      index   z17.s, w1, #1
>>      mul     z17.s, p7/m, z17.s, z16.s
>>      add     x10, x17, x10
>>      add     x10, x10, #0x10
>>      st1w    {z17.s}, p7, [x10]
>>    
>>    As there is no populating index instruction on AArch64 NEON or other
>>    platforms like x86, a function named is_populate_index_supported() is
>>    created in the VectorNode class for the backend support check.
>>    
>>    Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
>>    are tested and no issue is found. Hotspot jtreg has existing tests in
>>    compiler/c2/cr7192963/Test*Vect.java covering this kind of use cases so
>>    no new jtreg is created within this patch. A new JMH is created in this
>>    patch and tested on a 512-bit SVE machine. Below test result shows the
>>    performance can be significantly improved in some cases.
>>    
>>      Benchmark                       Performance
>>      IndexVector.exprWithIndex1            ~7.7x
>>      IndexVector.exprWithIndex2           ~13.3x
>>      IndexVector.indexArrayFill            ~5.7x
>
> Please resolve the merge conflicts.

Hi @TobiHartmann ,

> Do we need to add the new node to MatchRule::is_expensive()?
> 
> Do we need to add a declaration to vmStructs.cpp?

I've fixed these in my latest commit. In that commit, I also align matching rule code with other rules in AArch64 ad file.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7491


More information about the hotspot-compiler-dev mailing list