RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF
Xiaohong Gong
xgong at openjdk.org
Wed Oct 18 06:19:23 UTC 2023
On Wed, 18 Oct 2023 06:12:29 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]).
>
> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead.
>
> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly.
>
> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts.
>
> [1] https://github.com/openjdk/jdk/pull/3638
> [2] https://sleef.org/
> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/
> [4] https://packages.debian.org/bookworm/libsleef3
> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html
Here is the performance improvement for JMH benchmarks [1] [2] after enabling libsleef for AArch64 NEON and SVE:
NEON:
Benchmark (size) Mode Cnt Gain
DoubleMaxVector.ACOS 1024 thrpt 5 1.775
DoubleMaxVector.ASIN 1024 thrpt 5 2.134
DoubleMaxVector.ATAN 1024 thrpt 5 2.376
DoubleMaxVector.ATAN2 1024 thrpt 5 2.799
DoubleMaxVector.CBRT 1024 thrpt 5 1.588
DoubleMaxVector.COS 1024 thrpt 5 1.751
DoubleMaxVector.COSH 1024 thrpt 5 1.756
DoubleMaxVector.EXP 1024 thrpt 5 8.257
DoubleMaxVector.EXPM1 1024 thrpt 5 2.028
DoubleMaxVector.HYPOT 1024 thrpt 5 2.132
DoubleMaxVector.LOG 1024 thrpt 5 4.017
DoubleMaxVector.LOG10 1024 thrpt 5 5.693
DoubleMaxVector.LOG1P 1024 thrpt 5 2.788
DoubleMaxVector.POW 1024 thrpt 5 3.494
DoubleMaxVector.SIN 1024 thrpt 5 2.010
DoubleMaxVector.SINH 1024 thrpt 5 1.697
DoubleMaxVector.TAN 1024 thrpt 5 3.448
DoubleMaxVector.TANH 1024 thrpt 5 0.984
FloatMaxVector.ACOS 1024 thrpt 5 2.310
FloatMaxVector.ASIN 1024 thrpt 5 2.887
FloatMaxVector.ATAN 1024 thrpt 5 3.076
FloatMaxVector.ATAN2 1024 thrpt 5 4.162
FloatMaxVector.CBRT 1024 thrpt 5 2.941
FloatMaxVector.COS 1024 thrpt 5 1.832
FloatMaxVector.COSH 1024 thrpt 5 2.681
FloatMaxVector.EXP 1024 thrpt 5 15.758
FloatMaxVector.EXPM1 1024 thrpt 5 3.061
FloatMaxVector.HYPOT 1024 thrpt 5 3.428
FloatMaxVector.LOG 1024 thrpt 5 12.364
FloatMaxVector.LOG10 1024 thrpt 5 11.267
FloatMaxVector.LOG1P 1024 thrpt 5 5.819
FloatMaxVector.POW 1024 thrpt 5 6.710
FloatMaxVector.SIN 1024 thrpt 5 1.906
FloatMaxVector.SINH 1024 thrpt 5 2.505
FloatMaxVector.TAN 1024 thrpt 5 4.975
FloatMaxVector.TANH 1024 thrpt 5 1.157
Float64Vector.ACOS 1024 thrpt 5 1.855
Float64Vector.ASIN 1024 thrpt 5 2.294
Float64Vector.ATAN 1024 thrpt 5 2.082
Float64Vector.ATAN2 1024 thrpt 5 2.849
Float64Vector.CBRT 1024 thrpt 5 1.781
Float64Vector.COS 1024 thrpt 5 1.224
Float64Vector.COSH 1024 thrpt 5 1.793
Float64Vector.EXP 1024 thrpt 5 9.000
Float64Vector.EXPM1 1024 thrpt 5 2.096
Float64Vector.HYPOT 1024 thrpt 5 2.589
Float64Vector.LOG 1024 thrpt 5 5.582
Float64Vector.LOG10 1024 thrpt 5 5.495
Float64Vector.LOG1P 1024 thrpt 5 3.594
Float64Vector.POW 1024 thrpt 5 3.254
Float64Vector.SIN 1024 thrpt 5 1.254
Float64Vector.SINH 1024 thrpt 5 1.719
Float64Vector.TAN 1024 thrpt 5 2.670
Float64Vector.TANH 1024 thrpt 5 1.020
SVE 512-bit vector size:
Benchmark (size) Mode Cnt Gain
DoubleMaxVector.ACOS 1024 thrpt 5 1.731
DoubleMaxVector.ASIN 1024 thrpt 5 2.046
DoubleMaxVector.ATAN 1024 thrpt 5 4.932
DoubleMaxVector.ATAN2 1024 thrpt 5 6.032
DoubleMaxVector.CBRT 1024 thrpt 5 6.883
DoubleMaxVector.COS 1024 thrpt 5 5.512
DoubleMaxVector.COSH 1024 thrpt 5 2.796
DoubleMaxVector.EXP 1024 thrpt 5 42.490
DoubleMaxVector.EXPM1 1024 thrpt 5 6.188
DoubleMaxVector.HYPOT 1024 thrpt 5 2.195
DoubleMaxVector.LOG 1024 thrpt 5 19.532
DoubleMaxVector.LOG10 1024 thrpt 5 19.229
DoubleMaxVector.LOG1P 1024 thrpt 5 10.477
DoubleMaxVector.POW 1024 thrpt 5 11.887
DoubleMaxVector.SIN 1024 thrpt 5 6.073
DoubleMaxVector.SINH 1024 thrpt 5 2.994
DoubleMaxVector.TAN 1024 thrpt 5 15.417
FloatMaxVector.ACOS 1024 thrpt 5 3.867
FloatMaxVector.ASIN 1024 thrpt 5 4.291
FloatMaxVector.ATAN 1024 thrpt 5 11.786
FloatMaxVector.ATAN2 1024 thrpt 5 14.734
FloatMaxVector.CBRT 1024 thrpt 5 11.622
FloatMaxVector.COS 1024 thrpt 5 6.477
FloatMaxVector.COSH 1024 thrpt 5 3.571
FloatMaxVector.EXP 1024 thrpt 5 53.020
FloatMaxVector.EXPM1 1024 thrpt 5 6.348
FloatMaxVector.HYPOT 1024 thrpt 5 4.722
FloatMaxVector.LOG 1024 thrpt 5 41.263
FloatMaxVector.LOG10 1024 thrpt 5 47.685
FloatMaxVector.LOG1P 1024 thrpt 5 22.481
FloatMaxVector.POW 1024 thrpt 5 24.896
FloatMaxVector.SIN 1024 thrpt 5 6.768
FloatMaxVector.SINH 1024 thrpt 5 3.429
[1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L1068
[2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L1068
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1767727028
More information about the hotspot-dev
mailing list