[vectorIntrinsics] RFR: 8262498: More than 50% performance degradation of pow operator due to call with svml intrinsic after JDK-8261267

Jie Fu jiefu at openjdk.java.net
Sat Feb 27 13:36:08 UTC 2021


Hi all,

Performance of Vector API's pow operator has been decreased by more than 50% for micro benchmarks like:
  Double128Vector.POW
  Double256Vector.POW
  DoubleMaxVector.POW
  DoubleScalar.POW
  Float128Vector.POW
  Float256Vector.POW
  FloatMaxVector.POW

Experiments show that svml's pow intrinsics are slow (except for the 512-bit ones).
So only 512-bit vectors are allowed to be intrinsified with svml and others should be disabled.

Here is the effect of this fix.
                                           Before                    |                                              After
------------------------------------------------------------------------------------------------------------------------------------------
Benchmark            (size)   Mode  Cnt    Score     Error   Units   |   Benchmark            (size)   Mode  Cnt    Score    Error   Units
Double128Vector.POW    1024  thrpt    5   14.895 ?   0.070  ops/ms   |   Double128Vector.POW    1024  thrpt    5   31.897 ?  0.203  ops/ms
Double256Vector.POW    1024  thrpt    5   15.650 ?   1.274  ops/ms   |   Double256Vector.POW    1024  thrpt    5   36.690 ?  2.848  ops/ms
Double512Vector.POW    1024  thrpt    5  263.472 ?   0.062  ops/ms   |   Double512Vector.POW    1024  thrpt    5  261.681 ? 13.817  ops/ms
Double64Vector.POW     1024  thrpt    5   17.881 ?   0.244  ops/ms   |   Double64Vector.POW     1024  thrpt    5   17.734 ?  0.184  ops/ms
DoubleMaxVector.POW    1024  thrpt    5  263.613 ?   0.132  ops/ms   |   DoubleMaxVector.POW    1024  thrpt    5  263.085 ?  0.167  ops/ms
DoubleScalar.POW       1024  thrpt    5   45.268 ?   0.043  ops/ms   |   DoubleScalar.POW       1024  thrpt    5   45.220 ?  0.013  ops/ms
Float128Vector.POW     1024  thrpt    5   13.761 ?   0.092  ops/ms   |   Float128Vector.POW     1024  thrpt    5   28.578 ?  0.213  ops/ms
Float256Vector.POW     1024  thrpt    5   13.131 ?   0.101  ops/ms   |   Float256Vector.POW     1024  thrpt    5   29.414 ?  0.370  ops/ms
Float512Vector.POW     1024  thrpt    5  624.449 ? 267.160  ops/ms   |   Float512Vector.POW     1024  thrpt    5  649.519 ?  2.295  ops/ms
Float64Vector.POW      1024  thrpt    5   10.888 ?   0.069  ops/ms   |   Float64Vector.POW      1024  thrpt    5   26.376 ?  0.601  ops/ms
FloatMaxVector.POW     1024  thrpt    5  658.723 ?   2.445  ops/ms   |   FloatMaxVector.POW     1024  thrpt    5  663.723 ?  2.852  ops/ms
FloatScalar.POW        1024  thrpt    5   30.682 ?   0.095  ops/ms   |   FloatScalar.POW        1024  thrpt    5   30.678 ?  0.074  ops/ms

Thanks.
Best regards,
Jie

-------------

Commit messages:
 - 8262498: More than 50% performance degradation of pow operator due to call with svml intrinsic after JDK-8261267

Changes: https://git.openjdk.java.net/panama-vector/pull/42/files
 Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=42&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8262498
  Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/panama-vector/pull/42.diff
  Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/42/head:pull/42

PR: https://git.openjdk.java.net/panama-vector/pull/42


More information about the panama-dev mailing list