RFR: 8290249: Vectorize signum on AArch64 [v2]

Tue Aug 16 13:31:15 UTC 2022

On Tue, 16 Aug 2022 12:58:01 GMT, Bhavana-Kilambi <duke at openjdk.org> wrote:

>> This patch auto-vectorizes Math.signum intrinsic for float and  double
>> types on aarch64 (Neon and SVE). On SVE supporting machines, if the
>> MaxVectorSize <=16 the Neon code would be emitted and if the
>> MaxVectorSize > 16, the SVE code for the intrinsic would be emitted.
>> 
>> Following is the performance data for the micro test here -
>> test/micro/org/openjdk/bench/vm/compiler/VectorSignum.java
>> 
>> 
>> Benchmark	                Size    A	B       C
>> VectorSignum.doubleSignum	256	1.79	1.70	3.18
>> VectorSignum.doubleSignum	512	1.86	1.73	3.69
>> VectorSignum.doubleSignum	1024	1.89	1.74	2.98
>> VectorSignum.doubleSignum	2048	1.92	1.75	3.04
>> VectorSignum.floatSignum	256	3.34	3.06	3.92
>> VectorSignum.floatSignum	512	3.63	3.22	5.27
>> VectorSignum.floatSignum	1024	3.76	3.35	4.77
>> VectorSignum.floatSignum	2048	3.85	3.47	5.59
>> 
>> 
>> A, B , C machine descriptions given below -
>> A : 128-bit Neon machine
>> B : 256-bit SVE machine
>> C : 512-bit SVE machine
>> 
>> The numbers in the table are the gain ratios between the runtime (ns/op)
>> of the scalar, non-vectorized intrinsic code and the vectorized version
>> of the intrinsic (this patch).
>
> Bhavana-Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
> 
>  - Merge sve_facgt with int/fp compare and few optimizations
>  - Merge master
>  - 8290249: Vectorize signum on AArch64
>    
>    This patch auto-vectorizes Math.signum intrinsic for float and  double
>    types on aarch64 (Neon and SVE). On SVE supporting machines, if the
>    MaxVectorSize <=16 the Neon code would be emitted and if the
>    MaxVectorSize > 16, the SVE code for the intrinsic would be emitted.
>    
>    Following is the performance data for the micro test here -
>    test/micro/org/openjdk/bench/vm/compiler/VectorSignum.java
>    
>    Benchmark	                Size    A	B       C
>    VectorSignum.doubleSignum	256	1.79	1.70	3.18
>    VectorSignum.doubleSignum	512	1.86	1.73	3.69
>    VectorSignum.doubleSignum	1024	1.89	1.74	2.98
>    VectorSignum.doubleSignum	2048	1.92	1.75	3.04
>    VectorSignum.floatSignum	256	3.34	3.06	3.92
>    VectorSignum.floatSignum	512	3.63	3.22	5.27
>    VectorSignum.floatSignum	1024	3.76	3.35	4.77
>    VectorSignum.floatSignum	2048	3.85	3.47	5.59
>    
>    A, B , C machine descriptions given below -
>    A : 128-bit Neon machine
>    B : 256-bit SVE machine
>    C : 512-bit SVE machine
>    
>    The numbers in the table are the gain ratios between the runtime (ns/op)
>    of the scalar, non-vectorized intrinsic code and the vectorized version
>    of the intrinsic (this patch).

src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3530:

> 3528:       case EQ: cond_op = (op2 << 2) | 0b10; break;                                     \
> 3529:       case NE: cond_op = (op2 << 2) | 0b11; break;                                     \
> 3530:       case GE: cond_op = (op2 << 2) | ((op2 == 0b11) ? 0b01 : 0b00); break;            \

would something like this be easier to understand?

```    bool is_absolute = op2 == 0b11;```
....

      case GE: cond_op = (op2 << 2) | (is_absolute ? 0b01 : 0b00); break;            \

-------------

PR: https://git.openjdk.org/jdk/pull/9807