RFR: 8280510: AArch64: Vectorize operations with loop induction variable [v2]

Pengfei Li pli at openjdk.java.net
Wed Mar 16 01:51:20 UTC 2022


> AArch64 has SVE instruction of populating incrementing indices into an
> SVE vector register. With this we can vectorize some operations in loop
> with the induction variable operand, such as below.
> 
>   for (int i = 0; i < count; i++) {
>     b[i] = a[i] * i;
>   }
> 
> This patch enables the vectorization of operations with loop induction
> variable by extending current scope of C2 superword vectorizable packs.
> Before this patch, any scalar input node in a vectorizable pack must be
> an out-of-loop invariant. This patch takes the induction variable input
> as consideration. It allows the input to be the iv phi node or phi plus
> its index offset, and creates a `PopulateIndexNode` to generate a vector
> filled with incrementing indices. On AArch64 SVE, final generated code
> for above loop expression is like below.
> 
>   add     x12, x16, x10
>   add     x12, x12, #0x10
>   ld1w    {z16.s}, p7/z, [x12]
>   index   z17.s, w1, #1
>   mul     z17.s, p7/m, z17.s, z16.s
>   add     x10, x17, x10
>   add     x10, x10, #0x10
>   st1w    {z17.s}, p7, [x10]
> 
> As there is no populating index instruction on AArch64 NEON or other
> platforms like x86, a function named `is_populate_index_supported()` is
> created in the VectorNode class for the backend support check.
> 
> Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
> are tested and no issue is found. Hotspot jtreg has existing tests in
> `compiler/c2/cr7192963/Test*Vect.java` covering this kind of use cases so
> no new jtreg is created within this patch. A new JMH is created in this
> patch and tested on a 512-bit SVE machine. Below test result shows the
> performance can be significantly improved in some cases.
> 
>   Benchmark                       Performance
>   IndexVector.exprWithIndex1            ~7.7x
>   IndexVector.exprWithIndex2           ~13.3x
>   IndexVector.indexArrayFill            ~5.7x

Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Fix cut-and-paste error
 - Merge branch 'master' into indexvector
 - 8280510: AArch64: Vectorize operations with loop induction variable
   
   AArch64 has SVE instruction of populating incrementing indices into an
   SVE vector register. With this we can vectorize some operations in loop
   with the induction variable operand, such as below.
   
     for (int i = 0; i < count; i++) {
       b[i] = a[i] * i;
     }
   
   This patch enables the vectorization of operations with loop induction
   variable by extending current scope of C2 superword vectorizable packs.
   Before this patch, any scalar input node in a vectorizable pack must be
   an out-of-loop invariant. This patch takes the induction variable input
   as consideration. It allows the input to be the iv phi node or phi plus
   its index offset, and creates a PopulateIndexNode to generate a vector
   filled with incrementing indices. On AArch64 SVE, final generated code
   for above loop expression is like below.
   
     add     x12, x16, x10
     add     x12, x12, #0x10
     ld1w    {z16.s}, p7/z, [x12]
     index   z17.s, w1, #1
     mul     z17.s, p7/m, z17.s, z16.s
     add     x10, x17, x10
     add     x10, x10, #0x10
     st1w    {z17.s}, p7, [x10]
   
   As there is no populating index instruction on AArch64 NEON or other
   platforms like x86, a function named is_populate_index_supported() is
   created in the VectorNode class for the backend support check.
   
   Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
   are tested and no issue is found. Hotspot jtreg has existing tests in
   compiler/c2/cr7192963/Test*Vect.java covering this kind of use cases so
   no new jtreg is created within this patch. A new JMH is created in this
   patch and tested on a 512-bit SVE machine. Below test result shows the
   performance can be significantly improved in some cases.
   
     Benchmark                       Performance
     IndexVector.exprWithIndex1            ~7.7x
     IndexVector.exprWithIndex2           ~13.3x
     IndexVector.indexArrayFill            ~5.7x

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7491/files
  - new: https://git.openjdk.java.net/jdk/pull/7491/files/e85e8ef4..2fdb3e0c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7491&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7491&range=00-01

  Stats: 42778 lines in 1090 files changed: 28920 ins; 7727 del; 6131 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7491.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7491/head:pull/7491

PR: https://git.openjdk.java.net/jdk/pull/7491


More information about the hotspot-compiler-dev mailing list