Integrated: 8296999: AArch64: scalar intrinsics for reverse method in Integer and Long

Tue Feb 7 08:03:58 UTC 2023

On Thu, 12 Jan 2023 08:24:36 GMT, Chang Peng <duke at openjdk.org> wrote:

> x86 implemented the scalar intrinsics for reverse() method in
>  java.lang.Integer and java.lang.Long. See JDK-8290034 [1].
> 
> In this patch, we implement the AArch64 backend part
>  using `rbit` intruction [2].
> 
> TestReverseBitsVector.java was introduced in [1] to verify the
>  IR test results of auto-vectorization and mid-end optimizations.
> In this patch, we update it to test AArch64 as well.
> 
> Tests:
> 1: These scalar intrinsics can be covered by existing Jtreg cases,
>  e.g. [3][4]. Hence, we don't add new one in this patch.
> 2: tier1~3 pass on Linux/AArch64 and Linux/x86. There are no new failures.
> 3: All the vector test cases under the following directories pass on
>  128-bit and 256-bit SVE machines.
> 
> 
>   test/hotspot/jtreg/compiler/vectorapi/
>   test/jdk/jdk/incubator/vector/
>   test/hotspot/jtreg/compiler/vectorization/
> 
> 
> 4: JMH case
> We initially use the JMH case from [1] (i.e.Integers.reverse
>  and Longs.reverse) to evaluate the performance uplifts after
> enabling these scalar intrinsics. From the data shown below,
>  about 5x and 6x performance uplifts can be perceived respectively.
> 
> 
> Benchmark              (size) Mode  Before      After       Units
> Integers.reverse        500   avgt  0.456±0.002 0.080±0.001 us/op
> Longs.reverse           500   avgt  0.898±0.009 0.142±0.001 us/op
> 
> 
> With an in-depth analysis, we notice that the benefit comes from
>  auto-vectorization (SLP) improvement. Note that the loops in the
>  two benchmarks can be vectorized by SLP. Without the scalar intrinsics,
>  the vector version of the Java implementation [5][6] would be generated,
>  below is a code snippet of it.
> 
> 
> and   v17.16b, v16.16b, v18.16b
> ushr  v16.4s, v16.4s, #1
> and   v16.16b, v16.16b, v18.16b
> shl   v17.4s, v17.4s, #1
> orr   v16.16b, v17.16b, v16.16b
> 
> 
> With the introduction of scalar intrinsics, ReverseI and ReverseL
>  IR nodes can be created at mid-end. As a result, SLP could generate
>  ReverseV node, i.e. generating "rbitv" instruction, which is much
>  more efficient than previous instruction sequence. Hence, we can say
>  that the introduction of these two scalar intrinsics can improve SLP
>  to generate better code. It's an indirect effect of this patch.
> 
> Furthermore, in order to evaluate the direct effect of the scalar
>  intrinsics, we
> (1) evaluate a small test case which is not auto-vectorization friendly.
> (2）evaluate Integers.reverse and Longs.reverse in [1] with JVM option
>  "-XX:-UseSuperWord" to disable SLP.
> 
> In both cases, we observe about 5x performance uplifts after enabling
>  the scalar instrinics.
> 
> 
> Benchmark              (size) Mode  Before      After       Units
> Integers.reverse        500   avgt  1.072±0.002 0.212±0.001 us/op
> (disable SLP)
> Longs.reverse           500   avgt  1.073±0.002 0.212±0.001 us/op
> (disable SLP)
> 
> 
> [1] https://bugs.openjdk.org/browse/JDK-8290034
> [2] https://developer.arm.com/documentation/ddi0602/2022-12/Base-Instructions/RBIT--Reverse-Bits-?lang=en
> [3] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/IntMaxVectorTests.java#L1228
> [4] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/LongMaxVectorTests.java#L1250
> [5] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Integer.java#L1766
> [6] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Long.java#L1905

This pull request has now been integrated.

Changeset: 98433a2f
Author:    Chang Peng <Chang.Peng at arm.com>
Committer: Eric Liu <eliu at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/98433a2f6e7fe97e03ed26673c9925d7b26466bf
Stats:     49 lines in 3 files changed: 40 ins; 0 del; 9 mod

8296999: AArch64: scalar intrinsics for reverse method in Integer and Long

Reviewed-by: eliu, ngasson

-------------

PR: https://git.openjdk.org/jdk/pull/11962