RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4]

Andrew Haley aph at openjdk.org
Fri Nov 1 11:04:28 UTC 2024


On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605.
>> This pr is based on https://github.com/openjdk/jdk/pull/20781.
>> 
>> Thanks!
>> 
>> ## Test
>> ### tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> ### options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Tests
>> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr).
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Performance data
>> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr.
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add comment for tanh

Here are my results, Apple M1. Pretty similar to what we've seen, but no SVE.

Looks good.


                                          Stubs  no Stubs
Benchmark              (size)  Mode  Cnt   Score (us)      relative performance   
DoubleMaxVector.ACOS     1024  avgt    5   3.962   5.523   1.39
DoubleMaxVector.ASIN     1024  avgt    5   3.236   5.460   1.69
DoubleMaxVector.ATAN     1024  avgt    5   4.856  10.117   2.08
DoubleMaxVector.ATAN2    1024  avgt    5   7.144  18.977   2.66
DoubleMaxVector.CBRT     1024  avgt    5   8.802   9.837   1.12
DoubleMaxVector.COS      1024  avgt    5   6.281   8.789   1.40
DoubleMaxVector.COSH     1024  avgt    5   6.431   8.044   1.25
DoubleMaxVector.EXP      1024  avgt    5   1.939   6.417   3.31
DoubleMaxVector.EXPM1    1024  avgt    5   5.412   9.002   1.66
DoubleMaxVector.HYPOT    1024  avgt    5   4.269  12.323   2.89
DoubleMaxVector.LOG      1024  avgt    5   4.165   8.533   2.05
DoubleMaxVector.LOG10    1024  avgt    5   4.381  11.738   2.68
DoubleMaxVector.LOG1P    1024  avgt    5   4.383  12.135   2.77
DoubleMaxVector.POW      1024  avgt    5  14.060  22.053   1.57
DoubleMaxVector.SIN      1024  avgt    5   5.423   8.652   1.60
DoubleMaxVector.SINH     1024  avgt    5   6.251   8.168   1.31
DoubleMaxVector.TAN      1024  avgt    5   9.271  22.238   2.40
DoubleMaxVector.TANH     1024  avgt    5   4.515   4.499   1.00
Float64Vector.ACOS       1024  avgt    5   3.600   5.472   1.52
Float64Vector.ASIN       1024  avgt    5   2.776   5.547   2.00
Float64Vector.ATAN       1024  avgt    5   3.932  10.129   2.58
Float64Vector.ATAN2      1024  avgt    5   5.913  15.960   2.70
Float64Vector.CBRT       1024  avgt    5   7.464  10.078   1.35
Float64Vector.COS        1024  avgt    5  10.620   9.058   0.85
Float64Vector.COSH       1024  avgt    5   5.899   8.268   1.40
Float64Vector.EXP        1024  avgt    5   1.444   6.642   4.60
Float64Vector.EXPM1      1024  avgt    5   5.467   9.108   1.67
Float64Vector.HYPOT      1024  avgt    5   4.133   9.833   2.38
Float64Vector.LOG        1024  avgt    5   3.172   8.820   2.78
Float64Vector.LOG10      1024  avgt    5   3.346  12.142   3.63
Float64Vector.LOG1P      1024  avgt    5   3.216  12.507   3.89
Float64Vector.POW        1024  avgt    5  13.841  22.105   1.60
Float64Vector.SIN        1024  avgt    5  10.464   8.796   0.84
Float64Vector.SINH       1024  avgt    5   6.680   8.243   1.23
Float64Vector.TAN        1024  avgt    5  10.967  26.275   2.40
Float64Vector.TANH       1024  avgt    5   4.516   4.561   1.01
FloatMaxVector.ACOS      1024  avgt    5   1.819   3.752   2.06
FloatMaxVector.ASIN      1024  avgt    5   1.395   3.682   2.64
FloatMaxVector.ATAN      1024  avgt    5   1.970   7.003   3.55
FloatMaxVector.ATAN2     1024  avgt    5   2.951  12.313   4.17
FloatMaxVector.CBRT      1024  avgt    5   3.733   6.510   1.74
FloatMaxVector.COS       1024  avgt    5   5.405   7.363   1.36
FloatMaxVector.COSH      1024  avgt    5   2.951   5.741   1.95
FloatMaxVector.EXP       1024  avgt    5   0.725   4.745   6.54
FloatMaxVector.EXPM1     1024  avgt    5   2.732   6.490   2.38
FloatMaxVector.HYPOT     1024  avgt    5   2.062   6.328   3.07
FloatMaxVector.LOG       1024  avgt    5   1.587   6.847   4.31
FloatMaxVector.LOG10     1024  avgt    5   1.679  10.035   5.98
FloatMaxVector.LOG1P     1024  avgt    5   1.608   8.616   5.36
FloatMaxVector.POW       1024  avgt    5   6.916  19.432   2.81
FloatMaxVector.SIN       1024  avgt    5   5.239   7.202   1.37
FloatMaxVector.SINH      1024  avgt    5   2.992   5.681   1.90
FloatMaxVector.TAN       1024  avgt    5   5.562  17.419   3.13
FloatMaxVector.TANH      1024  avgt    5   2.788   2.791   1.00

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21502#issuecomment-2451695886


More information about the build-dev mailing list