[vectorIntrinsics] RFR: 8262498: More than 50% performance degradation of pow operator due to call with svml intrinsic after JDK-8261267 [v3]
Jie Fu
jiefu at openjdk.java.net
Thu Mar 4 07:26:15 UTC 2021
> Hi all,
>
> Performance of Vector API's pow operator has been decreased by more than 50% for micro benchmarks like:
> Double128Vector.POW
> Double256Vector.POW
> DoubleMaxVector.POW
> DoubleScalar.POW
> Float128Vector.POW
> Float256Vector.POW
> FloatMaxVector.POW
>
> Experiments show that svml's pow intrinsics are slow (except for the 512-bit ones).
> So only 512-bit vectors are allowed to be intrinsified with svml and others should be disabled.
>
> Here is the effect of this fix.
> Before | After
> ------------------------------------------------------------------------------------------------------------------------------------------
> Benchmark (size) Mode Cnt Score Error Units | Benchmark (size) Mode Cnt Score Error Units
> Double128Vector.POW 1024 thrpt 5 14.895 ? 0.070 ops/ms | Double128Vector.POW 1024 thrpt 5 31.897 ? 0.203 ops/ms
> Double256Vector.POW 1024 thrpt 5 15.650 ? 1.274 ops/ms | Double256Vector.POW 1024 thrpt 5 36.690 ? 2.848 ops/ms
> Double512Vector.POW 1024 thrpt 5 263.472 ? 0.062 ops/ms | Double512Vector.POW 1024 thrpt 5 261.681 ? 13.817 ops/ms
> Double64Vector.POW 1024 thrpt 5 17.881 ? 0.244 ops/ms | Double64Vector.POW 1024 thrpt 5 17.734 ? 0.184 ops/ms
> DoubleMaxVector.POW 1024 thrpt 5 263.613 ? 0.132 ops/ms | DoubleMaxVector.POW 1024 thrpt 5 263.085 ? 0.167 ops/ms
> DoubleScalar.POW 1024 thrpt 5 45.268 ? 0.043 ops/ms | DoubleScalar.POW 1024 thrpt 5 45.220 ? 0.013 ops/ms
> Float128Vector.POW 1024 thrpt 5 13.761 ? 0.092 ops/ms | Float128Vector.POW 1024 thrpt 5 28.578 ? 0.213 ops/ms
> Float256Vector.POW 1024 thrpt 5 13.131 ? 0.101 ops/ms | Float256Vector.POW 1024 thrpt 5 29.414 ? 0.370 ops/ms
> Float512Vector.POW 1024 thrpt 5 624.449 ? 267.160 ops/ms | Float512Vector.POW 1024 thrpt 5 649.519 ? 2.295 ops/ms
> Float64Vector.POW 1024 thrpt 5 10.888 ? 0.069 ops/ms | Float64Vector.POW 1024 thrpt 5 26.376 ? 0.601 ops/ms
> FloatMaxVector.POW 1024 thrpt 5 658.723 ? 2.445 ops/ms | FloatMaxVector.POW 1024 thrpt 5 663.723 ? 2.852 ops/ms
> FloatScalar.POW 1024 thrpt 5 30.682 ? 0.095 ops/ms | FloatScalar.POW 1024 thrpt 5 30.678 ? 0.074 ops/ms
>
> Thanks.
> Best regards,
> Jie
Jie Fu has updated the pull request incrementally with two additional commits since the last revision:
- Fix in stubGenerator_x86_64.cpp
- Revert change
-------------
Changes:
- all: https://git.openjdk.java.net/panama-vector/pull/42/files
- new: https://git.openjdk.java.net/panama-vector/pull/42/files/4ba408ca..0d3dbeba
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=42&range=02
- incr: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=42&range=01-02
Stats: 24 lines in 2 files changed: 0 ins; 24 del; 0 mod
Patch: https://git.openjdk.java.net/panama-vector/pull/42.diff
Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/42/head:pull/42
PR: https://git.openjdk.java.net/panama-vector/pull/42
More information about the panama-dev
mailing list