RFR: 8290249: Vectorize signum on AArch64 [v3]

Fri Aug 19 10:13:01 UTC 2022

> This patch auto-vectorizes Math.signum intrinsic for float and  double
> types on aarch64 (Neon and SVE). On SVE supporting machines, if the
> MaxVectorSize <=16 the Neon code would be emitted and if the
> MaxVectorSize > 16, the SVE code for the intrinsic would be emitted.
> 
> Following is the performance data for the micro test here -
> test/micro/org/openjdk/bench/vm/compiler/VectorSignum.java
> 
> 
> Benchmark	                Size    A	B       C
> VectorSignum.doubleSignum	256	1.79	1.70	3.18
> VectorSignum.doubleSignum	512	1.86	1.73	3.69
> VectorSignum.doubleSignum	1024	1.89	1.74	2.98
> VectorSignum.doubleSignum	2048	1.92	1.75	3.04
> VectorSignum.floatSignum	256	3.34	3.06	3.92
> VectorSignum.floatSignum	512	3.63	3.22	5.27
> VectorSignum.floatSignum	1024	3.76	3.35	4.77
> VectorSignum.floatSignum	2048	3.85	3.47	5.59
> 
> 
> A, B , C machine descriptions given below -
> A : 128-bit Neon machine
> B : 256-bit SVE machine
> C : 512-bit SVE machine
> 
> The numbers in the table are the gain ratios between the runtime (ns/op)
> of the scalar, non-vectorized intrinsic code and the vectorized version
> of the intrinsic (this patch).

Bhavana-Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:

 - Add signum implementation in the aarch64_vector.ad file
 - Merge master
 - Merge sve_facgt with int/fp compare and few optimizations
 - Merge master
 - 8290249: Vectorize signum on AArch64

   This patch auto-vectorizes Math.signum intrinsic for float and  double
   types on aarch64 (Neon and SVE). On SVE supporting machines, if the
   MaxVectorSize <=16 the Neon code would be emitted and if the
   MaxVectorSize > 16, the SVE code for the intrinsic would be emitted.

   Following is the performance data for the micro test here -
   test/micro/org/openjdk/bench/vm/compiler/VectorSignum.java

   Benchmark	                Size    A	B       C
   VectorSignum.doubleSignum	256	1.79	1.70	3.18
   VectorSignum.doubleSignum	512	1.86	1.73	3.69
   VectorSignum.doubleSignum	1024	1.89	1.74	2.98
   VectorSignum.doubleSignum	2048	1.92	1.75	3.04
   VectorSignum.floatSignum	256	3.34	3.06	3.92
   VectorSignum.floatSignum	512	3.63	3.22	5.27
   VectorSignum.floatSignum	1024	3.76	3.35	4.77
   VectorSignum.floatSignum	2048	3.85	3.47	5.59

   A, B , C machine descriptions given below -
   A : 128-bit Neon machine
   B : 256-bit SVE machine
   C : 512-bit SVE machine

   The numbers in the table are the gain ratios between the runtime (ns/op)
   of the scalar, non-vectorized intrinsic code and the vectorized version
   of the intrinsic (this patch).

-------------

Changes: https://git.openjdk.org/jdk/pull/9807/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9807&range=02
  Stats: 462 lines in 10 files changed: 134 ins; 2 del; 326 mod
  Patch: https://git.openjdk.org/jdk/pull/9807.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9807/head:pull/9807

PR: https://git.openjdk.org/jdk/pull/9807