[vectorIntrinsics] RFR: 8262498: More than 50% performance degradation of pow operator due to call with svml intrinsic after JDK-8261267
Jie Fu
jiefu at openjdk.java.net
Sat Feb 27 13:36:08 UTC 2021
Hi all,
Performance of Vector API's pow operator has been decreased by more than 50% for micro benchmarks like:
Double128Vector.POW
Double256Vector.POW
DoubleMaxVector.POW
DoubleScalar.POW
Float128Vector.POW
Float256Vector.POW
FloatMaxVector.POW
Experiments show that svml's pow intrinsics are slow (except for the 512-bit ones).
So only 512-bit vectors are allowed to be intrinsified with svml and others should be disabled.
Here is the effect of this fix.
Before | After
------------------------------------------------------------------------------------------------------------------------------------------
Benchmark (size) Mode Cnt Score Error Units | Benchmark (size) Mode Cnt Score Error Units
Double128Vector.POW 1024 thrpt 5 14.895 ? 0.070 ops/ms | Double128Vector.POW 1024 thrpt 5 31.897 ? 0.203 ops/ms
Double256Vector.POW 1024 thrpt 5 15.650 ? 1.274 ops/ms | Double256Vector.POW 1024 thrpt 5 36.690 ? 2.848 ops/ms
Double512Vector.POW 1024 thrpt 5 263.472 ? 0.062 ops/ms | Double512Vector.POW 1024 thrpt 5 261.681 ? 13.817 ops/ms
Double64Vector.POW 1024 thrpt 5 17.881 ? 0.244 ops/ms | Double64Vector.POW 1024 thrpt 5 17.734 ? 0.184 ops/ms
DoubleMaxVector.POW 1024 thrpt 5 263.613 ? 0.132 ops/ms | DoubleMaxVector.POW 1024 thrpt 5 263.085 ? 0.167 ops/ms
DoubleScalar.POW 1024 thrpt 5 45.268 ? 0.043 ops/ms | DoubleScalar.POW 1024 thrpt 5 45.220 ? 0.013 ops/ms
Float128Vector.POW 1024 thrpt 5 13.761 ? 0.092 ops/ms | Float128Vector.POW 1024 thrpt 5 28.578 ? 0.213 ops/ms
Float256Vector.POW 1024 thrpt 5 13.131 ? 0.101 ops/ms | Float256Vector.POW 1024 thrpt 5 29.414 ? 0.370 ops/ms
Float512Vector.POW 1024 thrpt 5 624.449 ? 267.160 ops/ms | Float512Vector.POW 1024 thrpt 5 649.519 ? 2.295 ops/ms
Float64Vector.POW 1024 thrpt 5 10.888 ? 0.069 ops/ms | Float64Vector.POW 1024 thrpt 5 26.376 ? 0.601 ops/ms
FloatMaxVector.POW 1024 thrpt 5 658.723 ? 2.445 ops/ms | FloatMaxVector.POW 1024 thrpt 5 663.723 ? 2.852 ops/ms
FloatScalar.POW 1024 thrpt 5 30.682 ? 0.095 ops/ms | FloatScalar.POW 1024 thrpt 5 30.678 ? 0.074 ops/ms
Thanks.
Best regards,
Jie
-------------
Commit messages:
- 8262498: More than 50% performance degradation of pow operator due to call with svml intrinsic after JDK-8261267
Changes: https://git.openjdk.java.net/panama-vector/pull/42/files
Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=42&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8262498
Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod
Patch: https://git.openjdk.java.net/panama-vector/pull/42.diff
Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/42/head:pull/42
PR: https://git.openjdk.java.net/panama-vector/pull/42
More information about the panama-dev
mailing list