Integrated: 8301012: [vectorapi]: Intrinsify CompressBitsV/ExpandBitsV and add the AArch64 SVE backend implementation
Bhavana Kilambi
bkilambi at openjdk.org
Mon Mar 27 08:53:46 UTC 2023
On Mon, 6 Feb 2023 17:23:20 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
> This patch adds mid-end compiler vector IR nodes for the scalar CompressBits and ExpandBits nodes - CompressBitsV and ExpandBitsV and also adds aarch64 backend support for these nodes using SVE2 instructions (included in the svebitperm feature). As there are direct instructions in SVE2 that map to these operations, a huge speed up in performance can be observed and it might significantly benefit all those workloads that extensively run these operations on an SVE2(with svebitperm feature) supporting machine.
>
> All the JTREG tests under "test/jdk/jdk/incubator/vector" pass successfully with this patch on an SVE2 machine.
> The JMH tests - COMPRESS_BITS and EXPAND_BITS from [1] and [2] were run on a 128-bit vector length, SVE2 and svebitperm supporting aarch64 machine. Following are the gains observed with this patch -
>
>
> Benchmark (length) Mode Cnt Gain
> IntMaxVector.COMPRESS_BITS 1024 thrpt 15 81.68x
> IntMaxVector.EXPAND_BITS 1024 thrpt 15 85.65x
> LongMaxVector.COMPRESS_BITS 1024 thrpt 15 70.78x
> LongMaxVector.EXPAND_BITS 1024 thrpt 15 76.31x
>
>
> The "Gain" column is the ratio between the throughput of benchmark runs with this patch and that of benchmark runs without this patch. This patch does not change the performance of these operations for all other machines that do not support these instructions or when run on a different architecture.
>
> This patch enables the generation of optimized SVE2 instructions for CompressBits and ExpandBits operations through vectorapi but at the same time with the addition of scalar implementation of CompressBits andExpandBits in the aarch64 backend with this commit - https://github.com/openjdk/jdk/commit/bbd8ae78200e4128d4eddf8694835956b5c5f142, it also enabes auto-vectorization of these nodes on aarch64 SVE2 supporting machines.
>
> Measured the performance of the following benchmarks with the master branch (auto-vectorization not enabled) with this patch
> (auto-vectorization enabled) -
>
>
> @Benchmark
> public void testCompInt() {
> for (int i = 0; i < length; i++) {
> ir[i] = Integer.compress(ia[i], ib[i]);
> }
> }
>
> @Benchmark
> public void testExpInt() {
> for (int i = 0; i < length; i++) {
> ir[i] = Integer.expand(ia[i], ib[i]);
> }
> }
>
> @Benchmark
> public void testCompLong() {
> for (int i = 0; i < length; i++) {
> lr[i] = Long.compress(la[i], lb[i]);
> }
> }
>
> @Benchmark
> public void testExpLong() {
> for (int i = 0; i < length; i++) {
> lr[i] = Long.expand(la[i], lb[i]);
> }
> }
>
> Benchmark (length) Mode Cnt Gain
> VectorCompressExpand.testCompInt 2048 thrpt 15 9.97x
> VectorCompressExpand.testCompLong 2048 thrpt 15 5.63x
> VectorCompressExpand.testExpInt 2048 thrpt 15 9.70x
> VectorCompressExpand.testExpLong 2048 thrpt 15 5.66x
>
> ```
>
> Gain column is the ratio between the throughput of this patch and that of the master branch.
> With vectorization enabled (either through Superword or Vectorapi), significant gains can be observed for CompressBits and ExpandBits.
>
> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java
> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java
This pull request has now been integrated.
Changeset: de1c12ed
Author: Bhavana Kilambi <bkilambi at openjdk.org>
Committer: Xiaohong Gong <xgong at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/de1c12ed636a43cc74b81c48cc987332fe341d7a
Stats: 259 lines in 9 files changed: 254 ins; 2 del; 3 mod
8301012: [vectorapi]: Intrinsify CompressBitsV/ExpandBitsV and add the AArch64 SVE backend implementation
Co-authored-by: Xiaohong Gong <xgong at openjdk.org>
Co-authored-by: Jatin Bhateja <jbhateja at openjdk.org>
Reviewed-by: ngasson, eliu, thartmann
-------------
PR: https://git.openjdk.org/jdk/pull/12446
More information about the hotspot-dev
mailing list