RFR: 8296999: AArch64: scalar intrinsics for reverse method in Integer and Long
Chang Peng
duke at openjdk.org
Fri Jan 27 22:40:58 UTC 2023
x86 implemented the scalar intrinsics for reverse() method in
java.lang.Integer and java.lang.Long. See JDK-8290034 [1].
In this patch, we implement the AArch64 backend part
using `rbit` intruction [2].
TestReverseBitsVector.java was introduced in [1] to verify the
IR test results of auto-vectorization and mid-end optimizations.
In this patch, we update it to test AArch64 as well.
Tests:
1: These scalar intrinsics can be covered by existing Jtreg cases,
e.g. [3][4]. Hence, we don't add new one in this patch.
2: tier1~3 pass on Linux/AArch64 and Linux/x86. There are no new failures.
3: All the vector test cases under the following directories pass on
128-bit and 256-bit SVE machines.
test/hotspot/jtreg/compiler/vectorapi/
test/jdk/jdk/incubator/vector/
test/hotspot/jtreg/compiler/vectorization/
4: JMH case
We initially use the JMH case from [1] (i.e.Integers.reverse
and Longs.reverse) to evaluate the performance uplifts after
enabling these scalar intrinsics. From the data shown below,
about 5x and 6x performance uplifts can be perceived respectively.
Benchmark (size) Mode Before After Units
Integers.reverse 500 avgt 0.456±0.002 0.080±0.001 us/op
Longs.reverse 500 avgt 0.898±0.009 0.142±0.001 us/op
With an in-depth analysis, we notice that the benefit comes from
auto-vectorization (SLP) improvement. Note that the loops in the
two benchmarks can be vectorized by SLP. Without the scalar intrinsics,
the vector version of the Java implementation [5][6] would be generated,
below is a code snippet of it.
and v17.16b, v16.16b, v18.16b
ushr v16.4s, v16.4s, #1
and v16.16b, v16.16b, v18.16b
shl v17.4s, v17.4s, #1
orr v16.16b, v17.16b, v16.16b
With the introduction of scalar intrinsics, ReverseI and ReverseL
IR nodes can be created at mid-end. As a result, SLP could generate
ReverseV node, i.e. generating "rbitv" instruction, which is much
more efficient than previous instruction sequence. Hence, we can say
that the introduction of these two scalar intrinsics can improve SLP
to generate better code. It's an indirect effect of this patch.
Furthermore, in order to evaluate the direct effect of the scalar
intrinsics, we
(1) evaluate a small test case which is not auto-vectorization friendly.
(2)evaluate Integers.reverse and Longs.reverse in [1] with JVM option
"-XX:-UseSuperWord" to disable SLP.
In both cases, we observe about 5x performance uplifts after enabling
the scalar instrinics.
Benchmark (size) Mode Before After Units
Integers.reverse 500 avgt 1.072±0.002 0.212±0.001 us/op
(disable SLP)
Longs.reverse 500 avgt 1.073±0.002 0.212±0.001 us/op
(disable SLP)
[1] https://bugs.openjdk.org/browse/JDK-8290034
[2] https://developer.arm.com/documentation/ddi0602/2022-12/Base-Instructions/RBIT--Reverse-Bits-?lang=en
[3] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/IntMaxVectorTests.java#L1228
[4] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/LongMaxVectorTests.java#L1250
[5] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Integer.java#L1766
[6] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Long.java#L1905
-------------
Commit messages:
- 8296999: AArch64: scalar intrinsics for reverse method in Integer and Long
Changes: https://git.openjdk.org/jdk/pull/11962/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11962&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8296999
Stats: 31 lines in 2 files changed: 23 ins; 0 del; 8 mod
Patch: https://git.openjdk.org/jdk/pull/11962.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/11962/head:pull/11962
PR: https://git.openjdk.org/jdk/pull/11962
More information about the hotspot-compiler-dev
mailing list