RFR: 8282875: AArch64: [vectorapi] Optimize Vector.reduceLane for SVE 64/128 vector size [v3]

Fri May 13 10:07:28 UTC 2022

> This patch speeds up add/mul/min/max reductions for SVE for 64/128
> vector size.
> 
> According to Neoverse N2/V1 software optimization guide[1][2], for
> 128-bit vector size reduction operations, we prefer using NEON
> instructions instead of SVE instructions. This patch adds some rules to
> distinguish 64/128 bits vector size with others, so that for these two
> special cases, they can generate code the same as NEON. E.g., For
> ByteVector.SPECIES_128, "ByteVector.reduceLanes(VectorOperators.ADD)"
> generates code as below:
> 
> 
>         Before:
>         uaddv   d17, p0, z16.b
>         smov    x15, v17.b[0]
>         add     w15, w14, w15, sxtb
> 
>         After:
>         addv    b17, v16.16b
>         smov    x12, v17.b[0]
>         add     w12, w12, w16, sxtb
> 
> No multiply reduction instruction in SVE, this patch generates code for
> MulReductionVL by using scalar insnstructions for 128-bit vector size.
> 
> With this patch, all of them have performance gain for specific vector
> micro benchmarks in my SVE testing system.
> 
> [1] https://developer.arm.com/documentation/pjdoc466751330-9685/latest/
> [2] https://developer.arm.com/documentation/PJDOC-466751330-18256/0001
> 
> Change-Id: I4bef0b3eb6ad1bac582e4236aef19787ccbd9b1c

Eric Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:

 - refine m4

   Change-Id: I7d76e606485727ca1f3de1d3af733f7e28fb9867
 - Merge jdk:master

   Change-Id: I275eb5834eacce029bc286b1b48128f07dd4070e
 - Generate SVE reduction for MIN/MAX/ADD as before

   Change-Id: Ibc6b9c1f46c42cd07f7bb73b81ed38829e9d0975
 - 8282875: AArch64: [vectorapi] Optimize Vector.reduceLane for SVE 64/128 vector size

   This patch speeds up add/mul/min/max reductions for SVE for 64/128
   vector size.

   According to Neoverse N2/V1 software optimization guide[1][2], for
   128-bit vector size reduction operations, we prefer using NEON
   instructions instead of SVE instructions. This patch adds some rules to
   distinguish 64/128 bits vector size with others, so that for these two
   special cases, they can generate code the same as NEON. E.g., For
   ByteVector.SPECIES_128, "ByteVector.reduceLanes(VectorOperators.ADD)"
   generates code as below:

   ```
           Before:
           uaddv   d17, p0, z16.b
           smov    x15, v17.b[0]
           add     w15, w14, w15, sxtb

           After:
           addv    b17, v16.16b
           smov    x12, v17.b[0]
           add     w12, w12, w16, sxtb
   ```
   No multiply reduction instruction in SVE, this patch generates code for
   MulReductionVL by using scalar insnstructions for 128-bit vector size.

   With this patch, all of them have performance gain for specific vector
   micro benchmarks in my SVE testing system.

   [1] https://developer.arm.com/documentation/pjdoc466751330-9685/latest/
   [2] https://developer.arm.com/documentation/PJDOC-466751330-18256/0001

   Change-Id: I4bef0b3eb6ad1bac582e4236aef19787ccbd9b1c

-------------

Changes: https://git.openjdk.java.net/jdk/pull/7999/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7999&range=02
  Stats: 1653 lines in 6 files changed: 672 ins; 691 del; 290 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7999.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7999/head:pull/7999

PR: https://git.openjdk.java.net/jdk/pull/7999