RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v3]
Xiaohong Gong
xgong at openjdk.org
Mon Mar 17 07:38:57 UTC 2025
On Mon, 17 Mar 2025 07:01:58 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
>>
>> - Merge branch 'jdk:master' into JDK-8350463
>> - Add the IR test
>> - 8350463: AArch64: Add vector rearrange support for small lane count vectors
>>
>> The AArch64 vector rearrange implementation currently lacks support for
>> vector types with lane counts < 4 (see [1]). This limitation results in
>> significant performance gaps when running Long/Double vector benchmarks
>> on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to
>> other SVE and x86 platforms.
>>
>> Vector rearrange operations depend on vector shuffle inputs, which used
>> byte array as payload previously. The minimum vector lane count of 4 for
>> byte type on AArch64 imposed this limitation on rearrange operations.
>> However, vector shuffle payload has been updated to use vector-specific
>> data types (e.g., `int` for `IntVector`) (see [2]). This change enables
>> us to remove the lane count restriction for vector rearrange operations.
>>
>> This patch added the rearrange support for vector types with small lane
>> count. Here are the main changes:
>> - Added AArch64 match rule support for `VectorRearrange` with smaller
>> lane counts (e.g., `2D/2S`)
>> - Relocated NEON implementation from ad file to c2 macro assembler file
>> for better handling of complex implementation
>> - Optimized temporary register usage in NEON implementation for
>> short/int/float types from two registers to one
>>
>> Following is the performance improvement data of several Vector API JMH
>> benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the
>> same JMH with other vector types remains unchanged.
>>
>> 1) NEON
>>
>> JMH on panama-vector:vectorIntrinsics:
>> ```
>> Benchmark (size) Mode Cnt Units Before After Gain
>> Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x
>> Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x
>> Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x
>> Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x
>> Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x
>> Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x
>> Int64Vecto...
>
> test/hotspot/jtreg/compiler/vectorapi/VectorRearrangeTest.java line 94:
>
>> 92: fsrc[i] = random.nextFloat();
>> 93: dsrc[i] = random.nextDouble();
>> 94: }
>
> Could you please use `Generators.java`? This makes sure that we have more "interesting" values in the distribution.
Thanks for your suggestion! I will take a look at this interface.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1998101209
More information about the hotspot-compiler-dev
mailing list