RFR: 8290249: Vectorize signum on AArch64 [v3]
Bhavana-Kilambi
duke at openjdk.org
Fri Aug 19 10:13:01 UTC 2022
> This patch auto-vectorizes Math.signum intrinsic for float and double
> types on aarch64 (Neon and SVE). On SVE supporting machines, if the
> MaxVectorSize <=16 the Neon code would be emitted and if the
> MaxVectorSize > 16, the SVE code for the intrinsic would be emitted.
>
> Following is the performance data for the micro test here -
> test/micro/org/openjdk/bench/vm/compiler/VectorSignum.java
>
>
> Benchmark Size A B C
> VectorSignum.doubleSignum 256 1.79 1.70 3.18
> VectorSignum.doubleSignum 512 1.86 1.73 3.69
> VectorSignum.doubleSignum 1024 1.89 1.74 2.98
> VectorSignum.doubleSignum 2048 1.92 1.75 3.04
> VectorSignum.floatSignum 256 3.34 3.06 3.92
> VectorSignum.floatSignum 512 3.63 3.22 5.27
> VectorSignum.floatSignum 1024 3.76 3.35 4.77
> VectorSignum.floatSignum 2048 3.85 3.47 5.59
>
>
> A, B , C machine descriptions given below -
> A : 128-bit Neon machine
> B : 256-bit SVE machine
> C : 512-bit SVE machine
>
> The numbers in the table are the gain ratios between the runtime (ns/op)
> of the scalar, non-vectorized intrinsic code and the vectorized version
> of the intrinsic (this patch).
Bhavana-Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
- Add signum implementation in the aarch64_vector.ad file
- Merge master
- Merge sve_facgt with int/fp compare and few optimizations
- Merge master
- 8290249: Vectorize signum on AArch64
This patch auto-vectorizes Math.signum intrinsic for float and double
types on aarch64 (Neon and SVE). On SVE supporting machines, if the
MaxVectorSize <=16 the Neon code would be emitted and if the
MaxVectorSize > 16, the SVE code for the intrinsic would be emitted.
Following is the performance data for the micro test here -
test/micro/org/openjdk/bench/vm/compiler/VectorSignum.java
Benchmark Size A B C
VectorSignum.doubleSignum 256 1.79 1.70 3.18
VectorSignum.doubleSignum 512 1.86 1.73 3.69
VectorSignum.doubleSignum 1024 1.89 1.74 2.98
VectorSignum.doubleSignum 2048 1.92 1.75 3.04
VectorSignum.floatSignum 256 3.34 3.06 3.92
VectorSignum.floatSignum 512 3.63 3.22 5.27
VectorSignum.floatSignum 1024 3.76 3.35 4.77
VectorSignum.floatSignum 2048 3.85 3.47 5.59
A, B , C machine descriptions given below -
A : 128-bit Neon machine
B : 256-bit SVE machine
C : 512-bit SVE machine
The numbers in the table are the gain ratios between the runtime (ns/op)
of the scalar, non-vectorized intrinsic code and the vectorized version
of the intrinsic (this patch).
-------------
Changes: https://git.openjdk.org/jdk/pull/9807/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9807&range=02
Stats: 462 lines in 10 files changed: 134 ins; 2 del; 326 mod
Patch: https://git.openjdk.org/jdk/pull/9807.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/9807/head:pull/9807
PR: https://git.openjdk.org/jdk/pull/9807
More information about the hotspot-compiler-dev
mailing list