Integrated: 8301012: [vectorapi]: Intrinsify CompressBitsV/ExpandBitsV and add the AArch64 SVE backend implementation

Bhavana Kilambi bkilambi at openjdk.org
Mon Mar 27 08:53:46 UTC 2023


On Mon, 6 Feb 2023 17:23:20 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

> This patch adds mid-end compiler vector IR nodes for the scalar CompressBits and ExpandBits nodes - CompressBitsV and ExpandBitsV and also adds aarch64 backend support for these nodes using SVE2 instructions (included in the svebitperm feature). As there are direct instructions in SVE2 that map to these operations, a huge speed up in performance can be observed and it might significantly benefit all those workloads that extensively run these operations on an SVE2(with svebitperm feature) supporting machine.
> 
> All the JTREG tests under "test/jdk/jdk/incubator/vector" pass successfully with this patch on an SVE2 machine.
> The JMH tests - COMPRESS_BITS and EXPAND_BITS from [1] and [2] were run on a 128-bit vector length, SVE2 and svebitperm supporting aarch64 machine. Following are the gains observed with this patch -
> 
> 
> Benchmark                       (length)  Mode    Cnt   Gain
> IntMaxVector.COMPRESS_BITS      1024      thrpt   15    81.68x
> IntMaxVector.EXPAND_BITS        1024      thrpt   15    85.65x
> LongMaxVector.COMPRESS_BITS     1024      thrpt   15    70.78x
> LongMaxVector.EXPAND_BITS       1024      thrpt   15    76.31x
> 
> 
> The "Gain" column is the ratio between the throughput of benchmark runs with this patch and that of benchmark runs without this patch. This patch does not change the performance of these operations for all other machines that do not support these instructions or when run on a different architecture.
> 
> This patch enables the generation of optimized SVE2 instructions for CompressBits and ExpandBits operations through vectorapi but at the same time with the addition of scalar implementation of CompressBits andExpandBits in the aarch64 backend with this commit - https://github.com/openjdk/jdk/commit/bbd8ae78200e4128d4eddf8694835956b5c5f142, it also enabes auto-vectorization of these nodes on aarch64 SVE2 supporting machines.
> 
> Measured the performance of the following benchmarks with the master branch (auto-vectorization not enabled) with this patch
> (auto-vectorization enabled) -
>     
> 
>         @Benchmark
>         public void testCompInt() {
>             for (int i = 0; i < length; i++) {
>                 ir[i] = Integer.compress(ia[i], ib[i]);
>             }
>         }
>     
>         @Benchmark
>         public void testExpInt() {
>             for (int i = 0; i < length; i++) {
>                 ir[i] = Integer.expand(ia[i], ib[i]);
>             }
>         }
>     
>         @Benchmark
>         public void testCompLong() {
>             for (int i = 0; i < length; i++) {
>                 lr[i] = Long.compress(la[i], lb[i]);
>             }
>         }
>     
>         @Benchmark
>         public void testExpLong() {
>             for (int i = 0; i < length; i++) {
>                 lr[i] = Long.expand(la[i], lb[i]);
>             }
>         }
>     
>     Benchmark                               (length)  Mode     Cnt   Gain
>     VectorCompressExpand.testCompInt        2048      thrpt    15    9.97x
>     VectorCompressExpand.testCompLong       2048      thrpt    15    5.63x
>     VectorCompressExpand.testExpInt         2048      thrpt    15    9.70x
>     VectorCompressExpand.testExpLong        2048      thrpt    15    5.66x
> 
> ```    
>     
> Gain column is the ratio between the throughput of this patch and that of the master branch.
> With vectorization enabled (either through Superword or Vectorapi), significant gains can be observed for CompressBits and ExpandBits.
> 
> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java 
> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java

This pull request has now been integrated.

Changeset: de1c12ed
Author:    Bhavana Kilambi <bkilambi at openjdk.org>
Committer: Xiaohong Gong <xgong at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/de1c12ed636a43cc74b81c48cc987332fe341d7a
Stats:     259 lines in 9 files changed: 254 ins; 2 del; 3 mod

8301012: [vectorapi]: Intrinsify CompressBitsV/ExpandBitsV and add the AArch64 SVE backend implementation

Co-authored-by: Xiaohong Gong <xgong at openjdk.org>
Co-authored-by: Jatin Bhateja <jbhateja at openjdk.org>
Reviewed-by: ngasson, eliu, thartmann

-------------

PR: https://git.openjdk.org/jdk/pull/12446


More information about the hotspot-dev mailing list