RFR: 8296999: AArch64: scalar intrinsics for reverse method in Integer and Long [v2]
Chang Peng
duke at openjdk.org
Mon Jan 30 02:59:39 UTC 2023
> x86 implemented the scalar intrinsics for reverse() method in
> java.lang.Integer and java.lang.Long. See JDK-8290034 [1].
>
> In this patch, we implement the AArch64 backend part
> using `rbit` intruction [2].
>
> TestReverseBitsVector.java was introduced in [1] to verify the
> IR test results of auto-vectorization and mid-end optimizations.
> In this patch, we update it to test AArch64 as well.
>
> Tests:
> 1: These scalar intrinsics can be covered by existing Jtreg cases,
> e.g. [3][4]. Hence, we don't add new one in this patch.
> 2: tier1~3 pass on Linux/AArch64 and Linux/x86. There are no new failures.
> 3: All the vector test cases under the following directories pass on
> 128-bit and 256-bit SVE machines.
>
>
> test/hotspot/jtreg/compiler/vectorapi/
> test/jdk/jdk/incubator/vector/
> test/hotspot/jtreg/compiler/vectorization/
>
>
> 4: JMH case
> We initially use the JMH case from [1] (i.e.Integers.reverse
> and Longs.reverse) to evaluate the performance uplifts after
> enabling these scalar intrinsics. From the data shown below,
> about 5x and 6x performance uplifts can be perceived respectively.
>
>
> Benchmark (size) Mode Before After Units
> Integers.reverse 500 avgt 0.456±0.002 0.080±0.001 us/op
> Longs.reverse 500 avgt 0.898±0.009 0.142±0.001 us/op
>
>
> With an in-depth analysis, we notice that the benefit comes from
> auto-vectorization (SLP) improvement. Note that the loops in the
> two benchmarks can be vectorized by SLP. Without the scalar intrinsics,
> the vector version of the Java implementation [5][6] would be generated,
> below is a code snippet of it.
>
>
> and v17.16b, v16.16b, v18.16b
> ushr v16.4s, v16.4s, #1
> and v16.16b, v16.16b, v18.16b
> shl v17.4s, v17.4s, #1
> orr v16.16b, v17.16b, v16.16b
>
>
> With the introduction of scalar intrinsics, ReverseI and ReverseL
> IR nodes can be created at mid-end. As a result, SLP could generate
> ReverseV node, i.e. generating "rbitv" instruction, which is much
> more efficient than previous instruction sequence. Hence, we can say
> that the introduction of these two scalar intrinsics can improve SLP
> to generate better code. It's an indirect effect of this patch.
>
> Furthermore, in order to evaluate the direct effect of the scalar
> intrinsics, we
> (1) evaluate a small test case which is not auto-vectorization friendly.
> (2)evaluate Integers.reverse and Longs.reverse in [1] with JVM option
> "-XX:-UseSuperWord" to disable SLP.
>
> In both cases, we observe about 5x performance uplifts after enabling
> the scalar instrinics.
>
>
> Benchmark (size) Mode Before After Units
> Integers.reverse 500 avgt 1.072±0.002 0.212±0.001 us/op
> (disable SLP)
> Longs.reverse 500 avgt 1.073±0.002 0.212±0.001 us/op
> (disable SLP)
>
>
> [1] https://bugs.openjdk.org/browse/JDK-8290034
> [2] https://developer.arm.com/documentation/ddi0602/2022-12/Base-Instructions/RBIT--Reverse-Bits-?lang=en
> [3] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/IntMaxVectorTests.java#L1228
> [4] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/LongMaxVectorTests.java#L1250
> [5] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Integer.java#L1766
> [6] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Long.java#L1905
Chang Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
- Merge branch 'openjdk:master' into add_reverse_bit
- 8296999: AArch64: scalar intrinsics for reverse method in Integer and Long
x86 implemented the scalar intrinsics for reverse() method in
java.lang.Integer and java.lang.Long. See JDK-8290034 [1].
In this patch, we implement the AArch64 backend part
using `rbit` intruction [2].
TestReverseBitsVector.java was introduced in [1] to verify the
IR test results of auto-vectorization and mid-end optimizations.
In this patch, we update it to test AArch64 as well.
Tests:
1: These scalar intrinsics can be covered by existing Jtreg cases,
e.g. [3][4]. Hence, we don't add new one in this patch.
2: tier1~3 pass on Linux/AArch64 and Linux/x86. There are no new failures.
3: All the vector test cases under the following directories pass on
128-bit and 256-bit SVE machines.
```
test/hotspot/jtreg/compiler/vectorapi/
test/jdk/jdk/incubator/vector/
test/hotspot/jtreg/compiler/vectorization/
```
4: JMH results: we initially use the JMH case from [1] (i.e.
Integers.reverse and Longs.reverse) to evaluate the performance
uplifts after enabling these scalar intrinsics. From the data
shown below, about 5x and 6x performance uplifts can be obtained
respectively. However, the benefit comes from auto-vectorization (SLP).
That is, ReverseV node can be generated after enabling the newly
added scalar intrinsics.
Therefore, in order to evaluate the scalar intrinsics, we firstly
evaluate the performance on test cases designed to be not
auto-vectorization friendly. Then, we evaluate the performance
on Integers.reverse and Longs.reverse when using JVM option
"-XX:-UseSuperWord" to disable SLP. We both get about
5x performance uplifts.
```
Benchmark (size) Mode Before After Units
Integers.reverse 500 avgt 0.456±0.002 0.080±000.1 us/op
Longs.reverse 500 avgt 0.898±0.009 0.142±0.0 0 us/op
Integers.reverse 500 avgt 1.072±0002 0.212±0.01 us/op
(disable SLP)
Longs.reverse 500 avgt 1.073±0.02 0.212±0.01 us/op
(disable SLP)
```
[1] https://bugs.openjdk.org/browse/JDK-8290034
[2] https://developer.arm.com/documentation/ddi0602/2022-12/Base-Instructions/RBIT--Reverse-Bits-?lang=en
[3] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/IntMaxVectorTests.java#L1228
[4] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/LongMaxVectorTests.java#L1250
TEST_LABEL: x86_64&&ubuntu&&conformance
JDK_SCOPE: hotspot:compiler/vectorization/TestReverseBitsVector.java
Jira: ENTLLT-5736
CustomizedGitHooks: yes
Change-Id: Ic6620d81e787def391d19db07fce53e1e82a0e43
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/11962/files
- new: https://git.openjdk.org/jdk/pull/11962/files/550634e7..f8cf8c1f
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=11962&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=11962&range=00-01
Stats: 41719 lines in 2013 files changed: 16962 ins; 6221 del; 18536 mod
Patch: https://git.openjdk.org/jdk/pull/11962.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/11962/head:pull/11962
PR: https://git.openjdk.org/jdk/pull/11962
More information about the hotspot-compiler-dev
mailing list