RFR: 8296999: AArch64: scalar intrinsics for reverse method in Integer and Long [v3]
Nick Gasson
ngasson at openjdk.org
Mon Feb 6 17:56:50 UTC 2023
On Mon, 30 Jan 2023 03:03:24 GMT, Chang Peng <duke at openjdk.org> wrote:
>> x86 implemented the scalar intrinsics for reverse() method in
>> java.lang.Integer and java.lang.Long. See JDK-8290034 [1].
>>
>> In this patch, we implement the AArch64 backend part
>> using `rbit` intruction [2].
>>
>> TestReverseBitsVector.java was introduced in [1] to verify the
>> IR test results of auto-vectorization and mid-end optimizations.
>> In this patch, we update it to test AArch64 as well.
>>
>> Tests:
>> 1: These scalar intrinsics can be covered by existing Jtreg cases,
>> e.g. [3][4]. Hence, we don't add new one in this patch.
>> 2: tier1~3 pass on Linux/AArch64 and Linux/x86. There are no new failures.
>> 3: All the vector test cases under the following directories pass on
>> 128-bit and 256-bit SVE machines.
>>
>>
>> test/hotspot/jtreg/compiler/vectorapi/
>> test/jdk/jdk/incubator/vector/
>> test/hotspot/jtreg/compiler/vectorization/
>>
>>
>> 4: JMH case
>> We initially use the JMH case from [1] (i.e.Integers.reverse
>> and Longs.reverse) to evaluate the performance uplifts after
>> enabling these scalar intrinsics. From the data shown below,
>> about 5x and 6x performance uplifts can be perceived respectively.
>>
>>
>> Benchmark (size) Mode Before After Units
>> Integers.reverse 500 avgt 0.456±0.002 0.080±0.001 us/op
>> Longs.reverse 500 avgt 0.898±0.009 0.142±0.001 us/op
>>
>>
>> With an in-depth analysis, we notice that the benefit comes from
>> auto-vectorization (SLP) improvement. Note that the loops in the
>> two benchmarks can be vectorized by SLP. Without the scalar intrinsics,
>> the vector version of the Java implementation [5][6] would be generated,
>> below is a code snippet of it.
>>
>>
>> and v17.16b, v16.16b, v18.16b
>> ushr v16.4s, v16.4s, #1
>> and v16.16b, v16.16b, v18.16b
>> shl v17.4s, v17.4s, #1
>> orr v16.16b, v17.16b, v16.16b
>>
>>
>> With the introduction of scalar intrinsics, ReverseI and ReverseL
>> IR nodes can be created at mid-end. As a result, SLP could generate
>> ReverseV node, i.e. generating "rbitv" instruction, which is much
>> more efficient than previous instruction sequence. Hence, we can say
>> that the introduction of these two scalar intrinsics can improve SLP
>> to generate better code. It's an indirect effect of this patch.
>>
>> Furthermore, in order to evaluate the direct effect of the scalar
>> intrinsics, we
>> (1) evaluate a small test case which is not auto-vectorization friendly.
>> (2)evaluate Integers.reverse and Longs.reverse in [1] with JVM option
>> "-XX:-UseSuperWord" to disable SLP.
>>
>> In both cases, we observe about 5x performance uplifts after enabling
>> the scalar instrinics.
>>
>>
>> Benchmark (size) Mode Before After Units
>> Integers.reverse 500 avgt 1.072±0.002 0.212±0.001 us/op
>> (disable SLP)
>> Longs.reverse 500 avgt 1.073±0.002 0.212±0.001 us/op
>> (disable SLP)
>>
>>
>> [1] https://bugs.openjdk.org/browse/JDK-8290034
>> [2] https://developer.arm.com/documentation/ddi0602/2022-12/Base-Instructions/RBIT--Reverse-Bits-?lang=en
>> [3] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/IntMaxVectorTests.java#L1228
>> [4] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/LongMaxVectorTests.java#L1250
>> [5] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Integer.java#L1766
>> [6] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Long.java#L1905
>
> Chang Peng has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>
> Writting m4 marco to merge two matching rules.
>
> Change-Id: I4f68c10493cb3f62e0a4c822089511dd0506cdc0
Looks OK to me.
-------------
Marked as reviewed by ngasson (Reviewer).
PR: https://git.openjdk.org/jdk/pull/11962
More information about the hotspot-compiler-dev
mailing list