RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v3]

Xiaohong Gong xgong at openjdk.org
Mon Mar 17 07:38:57 UTC 2025


On Mon, 17 Mar 2025 07:01:58 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
>> 
>>  - Merge branch 'jdk:master' into JDK-8350463
>>  - Add the IR test
>>  - 8350463: AArch64: Add vector rearrange support for small lane count vectors
>>    
>>    The AArch64 vector rearrange implementation currently lacks support for
>>    vector types with lane counts < 4 (see [1]). This limitation results in
>>    significant performance gaps when running Long/Double vector benchmarks
>>    on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to
>>    other SVE and x86 platforms.
>>    
>>    Vector rearrange operations depend on vector shuffle inputs, which used
>>    byte array as payload previously. The minimum vector lane count of 4 for
>>    byte type on AArch64 imposed this limitation on rearrange operations.
>>    However, vector shuffle payload has been updated to use vector-specific
>>    data types (e.g., `int` for `IntVector`) (see [2]). This change enables
>>    us to remove the lane count restriction for vector rearrange operations.
>>    
>>    This patch added the rearrange support for vector types with small lane
>>    count. Here are the main changes:
>>     - Added AArch64 match rule support for `VectorRearrange` with smaller
>>       lane counts (e.g., `2D/2S`)
>>     - Relocated NEON implementation from ad file to c2 macro assembler file
>>       for better handling of complex implementation
>>     - Optimized temporary register usage in NEON implementation for
>>       short/int/float types from two registers to one
>>    
>>    Following is the performance improvement data of several Vector API JMH
>>    benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the
>>    same JMH with other vector types remains unchanged.
>>    
>>    1) NEON
>>    
>>    JMH on panama-vector:vectorIntrinsics:
>>    ```
>>    Benchmark                    (size) Mode   Cnt Units   Before    After   Gain
>>    Double128Vector.rearrange     1024  thrpt  30  ops/ms  78.060   578.859  7.42x
>>    Double128Vector.sliceUnary    1024  thrpt  30  ops/ms  72.332  1811.664  25.05x
>>    Double128Vector.unsliceUnary  1024  thrpt  30  ops/ms  72.256  1812.344  25.08x
>>    Float64Vector.rearrange       1024  thrpt  30  ops/ms  77.879   558.797  7.18x
>>    Float64Vector.sliceUnary      1024  thrpt  30  ops/ms  70.528  1981.304  28.09x
>>    Float64Vector.unsliceUnary    1024  thrpt  30  ops/ms  71.735  1994.168  27.79x
>>    Int64Vecto...
>
> test/hotspot/jtreg/compiler/vectorapi/VectorRearrangeTest.java line 94:
> 
>> 92:             fsrc[i] = random.nextFloat();
>> 93:             dsrc[i] = random.nextDouble();
>> 94:         }
> 
> Could you please use `Generators.java`? This makes sure that we have more "interesting" values in the distribution.

Thanks for your suggestion! I will take a look at this interface.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1998101209


More information about the hotspot-compiler-dev mailing list