[vectorIntrinsics] RFR: 8262498: More than 50% performance degradation of pow operator due to call with svml intrinsic after JDK-8261267 [v3]

Jie Fu jiefu at openjdk.java.net
Thu Mar 4 07:26:15 UTC 2021


> Hi all,
> 
> Performance of Vector API's pow operator has been decreased by more than 50% for micro benchmarks like:
>   Double128Vector.POW
>   Double256Vector.POW
>   DoubleMaxVector.POW
>   DoubleScalar.POW
>   Float128Vector.POW
>   Float256Vector.POW
>   FloatMaxVector.POW
> 
> Experiments show that svml's pow intrinsics are slow (except for the 512-bit ones).
> So only 512-bit vectors are allowed to be intrinsified with svml and others should be disabled.
> 
> Here is the effect of this fix.
>                                            Before                    |                                              After
> ------------------------------------------------------------------------------------------------------------------------------------------
> Benchmark            (size)   Mode  Cnt    Score     Error   Units   |   Benchmark            (size)   Mode  Cnt    Score    Error   Units
> Double128Vector.POW    1024  thrpt    5   14.895 ?   0.070  ops/ms   |   Double128Vector.POW    1024  thrpt    5   31.897 ?  0.203  ops/ms
> Double256Vector.POW    1024  thrpt    5   15.650 ?   1.274  ops/ms   |   Double256Vector.POW    1024  thrpt    5   36.690 ?  2.848  ops/ms
> Double512Vector.POW    1024  thrpt    5  263.472 ?   0.062  ops/ms   |   Double512Vector.POW    1024  thrpt    5  261.681 ? 13.817  ops/ms
> Double64Vector.POW     1024  thrpt    5   17.881 ?   0.244  ops/ms   |   Double64Vector.POW     1024  thrpt    5   17.734 ?  0.184  ops/ms
> DoubleMaxVector.POW    1024  thrpt    5  263.613 ?   0.132  ops/ms   |   DoubleMaxVector.POW    1024  thrpt    5  263.085 ?  0.167  ops/ms
> DoubleScalar.POW       1024  thrpt    5   45.268 ?   0.043  ops/ms   |   DoubleScalar.POW       1024  thrpt    5   45.220 ?  0.013  ops/ms
> Float128Vector.POW     1024  thrpt    5   13.761 ?   0.092  ops/ms   |   Float128Vector.POW     1024  thrpt    5   28.578 ?  0.213  ops/ms
> Float256Vector.POW     1024  thrpt    5   13.131 ?   0.101  ops/ms   |   Float256Vector.POW     1024  thrpt    5   29.414 ?  0.370  ops/ms
> Float512Vector.POW     1024  thrpt    5  624.449 ? 267.160  ops/ms   |   Float512Vector.POW     1024  thrpt    5  649.519 ?  2.295  ops/ms
> Float64Vector.POW      1024  thrpt    5   10.888 ?   0.069  ops/ms   |   Float64Vector.POW      1024  thrpt    5   26.376 ?  0.601  ops/ms
> FloatMaxVector.POW     1024  thrpt    5  658.723 ?   2.445  ops/ms   |   FloatMaxVector.POW     1024  thrpt    5  663.723 ?  2.852  ops/ms
> FloatScalar.POW        1024  thrpt    5   30.682 ?   0.095  ops/ms   |   FloatScalar.POW        1024  thrpt    5   30.678 ?  0.074  ops/ms
> 
> Thanks.
> Best regards,
> Jie

Jie Fu has updated the pull request incrementally with two additional commits since the last revision:

 - Fix in stubGenerator_x86_64.cpp
 - Revert change

-------------

Changes:
  - all: https://git.openjdk.java.net/panama-vector/pull/42/files
  - new: https://git.openjdk.java.net/panama-vector/pull/42/files/4ba408ca..0d3dbeba

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=42&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=42&range=01-02

  Stats: 24 lines in 2 files changed: 0 ins; 24 del; 0 mod
  Patch: https://git.openjdk.java.net/panama-vector/pull/42.diff
  Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/42/head:pull/42

PR: https://git.openjdk.java.net/panama-vector/pull/42


More information about the panama-dev mailing list