Integrated: 8296999: AArch64: scalar intrinsics for reverse method in Integer and Long
Chang Peng
duke at openjdk.org
Tue Feb 7 08:03:58 UTC 2023
On Thu, 12 Jan 2023 08:24:36 GMT, Chang Peng <duke at openjdk.org> wrote:
> x86 implemented the scalar intrinsics for reverse() method in
> java.lang.Integer and java.lang.Long. See JDK-8290034 [1].
>
> In this patch, we implement the AArch64 backend part
> using `rbit` intruction [2].
>
> TestReverseBitsVector.java was introduced in [1] to verify the
> IR test results of auto-vectorization and mid-end optimizations.
> In this patch, we update it to test AArch64 as well.
>
> Tests:
> 1: These scalar intrinsics can be covered by existing Jtreg cases,
> e.g. [3][4]. Hence, we don't add new one in this patch.
> 2: tier1~3 pass on Linux/AArch64 and Linux/x86. There are no new failures.
> 3: All the vector test cases under the following directories pass on
> 128-bit and 256-bit SVE machines.
>
>
> test/hotspot/jtreg/compiler/vectorapi/
> test/jdk/jdk/incubator/vector/
> test/hotspot/jtreg/compiler/vectorization/
>
>
> 4: JMH case
> We initially use the JMH case from [1] (i.e.Integers.reverse
> and Longs.reverse) to evaluate the performance uplifts after
> enabling these scalar intrinsics. From the data shown below,
> about 5x and 6x performance uplifts can be perceived respectively.
>
>
> Benchmark (size) Mode Before After Units
> Integers.reverse 500 avgt 0.456±0.002 0.080±0.001 us/op
> Longs.reverse 500 avgt 0.898±0.009 0.142±0.001 us/op
>
>
> With an in-depth analysis, we notice that the benefit comes from
> auto-vectorization (SLP) improvement. Note that the loops in the
> two benchmarks can be vectorized by SLP. Without the scalar intrinsics,
> the vector version of the Java implementation [5][6] would be generated,
> below is a code snippet of it.
>
>
> and v17.16b, v16.16b, v18.16b
> ushr v16.4s, v16.4s, #1
> and v16.16b, v16.16b, v18.16b
> shl v17.4s, v17.4s, #1
> orr v16.16b, v17.16b, v16.16b
>
>
> With the introduction of scalar intrinsics, ReverseI and ReverseL
> IR nodes can be created at mid-end. As a result, SLP could generate
> ReverseV node, i.e. generating "rbitv" instruction, which is much
> more efficient than previous instruction sequence. Hence, we can say
> that the introduction of these two scalar intrinsics can improve SLP
> to generate better code. It's an indirect effect of this patch.
>
> Furthermore, in order to evaluate the direct effect of the scalar
> intrinsics, we
> (1) evaluate a small test case which is not auto-vectorization friendly.
> (2)evaluate Integers.reverse and Longs.reverse in [1] with JVM option
> "-XX:-UseSuperWord" to disable SLP.
>
> In both cases, we observe about 5x performance uplifts after enabling
> the scalar instrinics.
>
>
> Benchmark (size) Mode Before After Units
> Integers.reverse 500 avgt 1.072±0.002 0.212±0.001 us/op
> (disable SLP)
> Longs.reverse 500 avgt 1.073±0.002 0.212±0.001 us/op
> (disable SLP)
>
>
> [1] https://bugs.openjdk.org/browse/JDK-8290034
> [2] https://developer.arm.com/documentation/ddi0602/2022-12/Base-Instructions/RBIT--Reverse-Bits-?lang=en
> [3] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/IntMaxVectorTests.java#L1228
> [4] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/LongMaxVectorTests.java#L1250
> [5] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Integer.java#L1766
> [6] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Long.java#L1905
This pull request has now been integrated.
Changeset: 98433a2f
Author: Chang Peng <Chang.Peng at arm.com>
Committer: Eric Liu <eliu at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/98433a2f6e7fe97e03ed26673c9925d7b26466bf
Stats: 49 lines in 3 files changed: 40 ins; 0 del; 9 mod
8296999: AArch64: scalar intrinsics for reverse method in Integer and Long
Reviewed-by: eliu, ngasson
-------------
PR: https://git.openjdk.org/jdk/pull/11962
More information about the hotspot-compiler-dev
mailing list