RFR: 8290249: Vectorize signum on AArch64

Wed Aug 10 09:35:36 UTC 2022

On Tue, 9 Aug 2022 10:15:41 GMT, Bhavana-Kilambi <duke at openjdk.org> wrote:

> This patch auto-vectorizes Math.signum intrinsic for float and  double
> types on aarch64 (Neon and SVE). On SVE supporting machines, if the
> MaxVectorSize <=16 the Neon code would be emitted and if the
> MaxVectorSize > 16, the SVE code for the intrinsic would be emitted.
> 
> Following is the performance data for the micro test here -
> test/micro/org/openjdk/bench/vm/compiler/VectorSignum.java
> 
> 
> Benchmark	                Size    A	B       C
> VectorSignum.doubleSignum	256	1.79	1.70	3.18
> VectorSignum.doubleSignum	512	1.86	1.73	3.69
> VectorSignum.doubleSignum	1024	1.89	1.74	2.98
> VectorSignum.doubleSignum	2048	1.92	1.75	3.04
> VectorSignum.floatSignum	256	3.34	3.06	3.92
> VectorSignum.floatSignum	512	3.63	3.22	5.27
> VectorSignum.floatSignum	1024	3.76	3.35	4.77
> VectorSignum.floatSignum	2048	3.85	3.47	5.59
> 
> 
> A, B , C machine descriptions given below -
> A : 128-bit Neon machine
> B : 256-bit SVE machine
> C : 512-bit SVE machine
> 
> The numbers in the table are the gain ratios between the runtime (ns/op)
> of the scalar, non-vectorized intrinsic code and the vectorized version
> of the intrinsic (this patch).

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1679:

> 1677:     case S:
> 1678:       sve_and(vtmp, T, 0x80000000); // Extract the sign bit of float value in every lane of src
> 1679:       sve_orr(vtmp, T, 0x3f800000); // OR it with +1 to make the final result +1 or -1 depending

Suggestion:

      sve_orr(vtmp, T, jlong_cast(1.0)); // OR it with +1 to make the final result +1 or -1 depending

...everywhere

-------------

PR: https://git.openjdk.org/jdk/pull/9807