RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4]
Andrew Haley
aph at openjdk.org
Fri Nov 1 11:04:28 UTC 2024
On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Hi,
>> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605.
>> This pr is based on https://github.com/openjdk/jdk/pull/20781.
>>
>> Thanks!
>>
>> ## Test
>> ### tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>>
>> ### options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>>
>> ## Performance
>>
>> ### Tests
>> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr).
>>
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>>
>> ### Performance data
>> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr.
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>
> add comment for tanh
Here are my results, Apple M1. Pretty similar to what we've seen, but no SVE.
Looks good.
Stubs no Stubs
Benchmark (size) Mode Cnt Score (us) relative performance
DoubleMaxVector.ACOS 1024 avgt 5 3.962 5.523 1.39
DoubleMaxVector.ASIN 1024 avgt 5 3.236 5.460 1.69
DoubleMaxVector.ATAN 1024 avgt 5 4.856 10.117 2.08
DoubleMaxVector.ATAN2 1024 avgt 5 7.144 18.977 2.66
DoubleMaxVector.CBRT 1024 avgt 5 8.802 9.837 1.12
DoubleMaxVector.COS 1024 avgt 5 6.281 8.789 1.40
DoubleMaxVector.COSH 1024 avgt 5 6.431 8.044 1.25
DoubleMaxVector.EXP 1024 avgt 5 1.939 6.417 3.31
DoubleMaxVector.EXPM1 1024 avgt 5 5.412 9.002 1.66
DoubleMaxVector.HYPOT 1024 avgt 5 4.269 12.323 2.89
DoubleMaxVector.LOG 1024 avgt 5 4.165 8.533 2.05
DoubleMaxVector.LOG10 1024 avgt 5 4.381 11.738 2.68
DoubleMaxVector.LOG1P 1024 avgt 5 4.383 12.135 2.77
DoubleMaxVector.POW 1024 avgt 5 14.060 22.053 1.57
DoubleMaxVector.SIN 1024 avgt 5 5.423 8.652 1.60
DoubleMaxVector.SINH 1024 avgt 5 6.251 8.168 1.31
DoubleMaxVector.TAN 1024 avgt 5 9.271 22.238 2.40
DoubleMaxVector.TANH 1024 avgt 5 4.515 4.499 1.00
Float64Vector.ACOS 1024 avgt 5 3.600 5.472 1.52
Float64Vector.ASIN 1024 avgt 5 2.776 5.547 2.00
Float64Vector.ATAN 1024 avgt 5 3.932 10.129 2.58
Float64Vector.ATAN2 1024 avgt 5 5.913 15.960 2.70
Float64Vector.CBRT 1024 avgt 5 7.464 10.078 1.35
Float64Vector.COS 1024 avgt 5 10.620 9.058 0.85
Float64Vector.COSH 1024 avgt 5 5.899 8.268 1.40
Float64Vector.EXP 1024 avgt 5 1.444 6.642 4.60
Float64Vector.EXPM1 1024 avgt 5 5.467 9.108 1.67
Float64Vector.HYPOT 1024 avgt 5 4.133 9.833 2.38
Float64Vector.LOG 1024 avgt 5 3.172 8.820 2.78
Float64Vector.LOG10 1024 avgt 5 3.346 12.142 3.63
Float64Vector.LOG1P 1024 avgt 5 3.216 12.507 3.89
Float64Vector.POW 1024 avgt 5 13.841 22.105 1.60
Float64Vector.SIN 1024 avgt 5 10.464 8.796 0.84
Float64Vector.SINH 1024 avgt 5 6.680 8.243 1.23
Float64Vector.TAN 1024 avgt 5 10.967 26.275 2.40
Float64Vector.TANH 1024 avgt 5 4.516 4.561 1.01
FloatMaxVector.ACOS 1024 avgt 5 1.819 3.752 2.06
FloatMaxVector.ASIN 1024 avgt 5 1.395 3.682 2.64
FloatMaxVector.ATAN 1024 avgt 5 1.970 7.003 3.55
FloatMaxVector.ATAN2 1024 avgt 5 2.951 12.313 4.17
FloatMaxVector.CBRT 1024 avgt 5 3.733 6.510 1.74
FloatMaxVector.COS 1024 avgt 5 5.405 7.363 1.36
FloatMaxVector.COSH 1024 avgt 5 2.951 5.741 1.95
FloatMaxVector.EXP 1024 avgt 5 0.725 4.745 6.54
FloatMaxVector.EXPM1 1024 avgt 5 2.732 6.490 2.38
FloatMaxVector.HYPOT 1024 avgt 5 2.062 6.328 3.07
FloatMaxVector.LOG 1024 avgt 5 1.587 6.847 4.31
FloatMaxVector.LOG10 1024 avgt 5 1.679 10.035 5.98
FloatMaxVector.LOG1P 1024 avgt 5 1.608 8.616 5.36
FloatMaxVector.POW 1024 avgt 5 6.916 19.432 2.81
FloatMaxVector.SIN 1024 avgt 5 5.239 7.202 1.37
FloatMaxVector.SINH 1024 avgt 5 2.992 5.681 1.90
FloatMaxVector.TAN 1024 avgt 5 5.562 17.419 3.13
FloatMaxVector.TANH 1024 avgt 5 2.788 2.791 1.00
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21502#issuecomment-2451695886
More information about the build-dev
mailing list