RFR: 8296999: AArch64: scalar intrinsics for reverse method in Integer and Long [v3]

Eric Liu eliu at openjdk.org
Tue Jan 31 01:35:58 UTC 2023


On Mon, 30 Jan 2023 03:03:24 GMT, Chang Peng <duke at openjdk.org> wrote:

>> x86 implemented the scalar intrinsics for reverse() method in
>>  java.lang.Integer and java.lang.Long. See JDK-8290034 [1].
>> 
>> In this patch, we implement the AArch64 backend part
>>  using `rbit` intruction [2].
>> 
>> TestReverseBitsVector.java was introduced in [1] to verify the
>>  IR test results of auto-vectorization and mid-end optimizations.
>> In this patch, we update it to test AArch64 as well.
>> 
>> Tests:
>> 1: These scalar intrinsics can be covered by existing Jtreg cases,
>>  e.g. [3][4]. Hence, we don't add new one in this patch.
>> 2: tier1~3 pass on Linux/AArch64 and Linux/x86. There are no new failures.
>> 3: All the vector test cases under the following directories pass on
>>  128-bit and 256-bit SVE machines.
>> 
>> 
>>   test/hotspot/jtreg/compiler/vectorapi/
>>   test/jdk/jdk/incubator/vector/
>>   test/hotspot/jtreg/compiler/vectorization/
>> 
>> 
>> 4: JMH case
>> We initially use the JMH case from [1] (i.e.Integers.reverse
>>  and Longs.reverse) to evaluate the performance uplifts after
>> enabling these scalar intrinsics. From the data shown below,
>>  about 5x and 6x performance uplifts can be perceived respectively.
>> 
>> 
>> Benchmark              (size) Mode  Before      After       Units
>> Integers.reverse        500   avgt  0.456±0.002 0.080±0.001 us/op
>> Longs.reverse           500   avgt  0.898±0.009 0.142±0.001 us/op
>> 
>> 
>> With an in-depth analysis, we notice that the benefit comes from
>>  auto-vectorization (SLP) improvement. Note that the loops in the
>>  two benchmarks can be vectorized by SLP. Without the scalar intrinsics,
>>  the vector version of the Java implementation [5][6] would be generated,
>>  below is a code snippet of it.
>> 
>> 
>> and   v17.16b, v16.16b, v18.16b
>> ushr  v16.4s, v16.4s, #1
>> and   v16.16b, v16.16b, v18.16b
>> shl   v17.4s, v17.4s, #1
>> orr   v16.16b, v17.16b, v16.16b
>> 
>> 
>> With the introduction of scalar intrinsics, ReverseI and ReverseL
>>  IR nodes can be created at mid-end. As a result, SLP could generate
>>  ReverseV node, i.e. generating "rbitv" instruction, which is much
>>  more efficient than previous instruction sequence. Hence, we can say
>>  that the introduction of these two scalar intrinsics can improve SLP
>>  to generate better code. It's an indirect effect of this patch.
>> 
>> Furthermore, in order to evaluate the direct effect of the scalar
>>  intrinsics, we
>> (1) evaluate a small test case which is not auto-vectorization friendly.
>> (2)evaluate Integers.reverse and Longs.reverse in [1] with JVM option
>>  "-XX:-UseSuperWord" to disable SLP.
>> 
>> In both cases, we observe about 5x performance uplifts after enabling
>>  the scalar instrinics.
>> 
>> 
>> Benchmark              (size) Mode  Before      After       Units
>> Integers.reverse        500   avgt  1.072±0.002 0.212±0.001 us/op
>> (disable SLP)
>> Longs.reverse           500   avgt  1.073±0.002 0.212±0.001 us/op
>> (disable SLP)
>> 
>> 
>> [1] https://bugs.openjdk.org/browse/JDK-8290034
>> [2] https://developer.arm.com/documentation/ddi0602/2022-12/Base-Instructions/RBIT--Reverse-Bits-?lang=en
>> [3] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/IntMaxVectorTests.java#L1228
>> [4] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/LongMaxVectorTests.java#L1250
>> [5] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Integer.java#L1766
>> [6] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Long.java#L1905
>
> Chang Peng has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   Writting m4 marco to merge two matching rules.
>   
>   Change-Id: I4f68c10493cb3f62e0a4c822089511dd0506cdc0

LGTM.

-------------

Marked as reviewed by eliu (Committer).

PR: https://git.openjdk.org/jdk/pull/11962


More information about the hotspot-compiler-dev mailing list