[vectorIntrinsics] Integrated: 8262498: More than 50% performance degradation of pow operator due to call with svml intrinsic after JDK-8261267

Jie Fu jiefu at openjdk.java.net
Thu Mar 4 23:39:44 UTC 2021


On Sat, 27 Feb 2021 13:31:08 GMT, Jie Fu <jiefu at openjdk.org> wrote:

> Hi all,
> 
> Performance of Vector API's pow operator has been decreased by more than 50% for micro benchmarks like:
>   Double128Vector.POW
>   Double256Vector.POW
>   DoubleMaxVector.POW
>   DoubleScalar.POW
>   Float128Vector.POW
>   Float256Vector.POW
>   FloatMaxVector.POW
> 
> Experiments show that svml's pow intrinsics are slow (except for the 512-bit ones).
> So only 512-bit vectors are allowed to be intrinsified with svml and others should be disabled.
> 
> Here is the effect of this fix.
>                                            Before                    |                                              After
> ------------------------------------------------------------------------------------------------------------------------------------------
> Benchmark            (size)   Mode  Cnt    Score     Error   Units   |   Benchmark            (size)   Mode  Cnt    Score    Error   Units
> Double128Vector.POW    1024  thrpt    5   14.895 ?   0.070  ops/ms   |   Double128Vector.POW    1024  thrpt    5   31.897 ?  0.203  ops/ms
> Double256Vector.POW    1024  thrpt    5   15.650 ?   1.274  ops/ms   |   Double256Vector.POW    1024  thrpt    5   36.690 ?  2.848  ops/ms
> Double512Vector.POW    1024  thrpt    5  263.472 ?   0.062  ops/ms   |   Double512Vector.POW    1024  thrpt    5  261.681 ? 13.817  ops/ms
> Double64Vector.POW     1024  thrpt    5   17.881 ?   0.244  ops/ms   |   Double64Vector.POW     1024  thrpt    5   17.734 ?  0.184  ops/ms
> DoubleMaxVector.POW    1024  thrpt    5  263.613 ?   0.132  ops/ms   |   DoubleMaxVector.POW    1024  thrpt    5  263.085 ?  0.167  ops/ms
> DoubleScalar.POW       1024  thrpt    5   45.268 ?   0.043  ops/ms   |   DoubleScalar.POW       1024  thrpt    5   45.220 ?  0.013  ops/ms
> Float128Vector.POW     1024  thrpt    5   13.761 ?   0.092  ops/ms   |   Float128Vector.POW     1024  thrpt    5   28.578 ?  0.213  ops/ms
> Float256Vector.POW     1024  thrpt    5   13.131 ?   0.101  ops/ms   |   Float256Vector.POW     1024  thrpt    5   29.414 ?  0.370  ops/ms
> Float512Vector.POW     1024  thrpt    5  624.449 ? 267.160  ops/ms   |   Float512Vector.POW     1024  thrpt    5  649.519 ?  2.295  ops/ms
> Float64Vector.POW      1024  thrpt    5   10.888 ?   0.069  ops/ms   |   Float64Vector.POW      1024  thrpt    5   26.376 ?  0.601  ops/ms
> FloatMaxVector.POW     1024  thrpt    5  658.723 ?   2.445  ops/ms   |   FloatMaxVector.POW     1024  thrpt    5  663.723 ?  2.852  ops/ms
> FloatScalar.POW        1024  thrpt    5   30.682 ?   0.095  ops/ms   |   FloatScalar.POW        1024  thrpt    5   30.678 ?  0.074  ops/ms
> 
> Thanks.
> Best regards,
> Jie

This pull request has now been integrated.

Changeset: 0523be61
Author:    Jie Fu <jiefu at openjdk.org>
Committer: Sandhya Viswanathan <sviswanathan at openjdk.org>
URL:       https://git.openjdk.java.net/panama-vector/commit/0523be61
Stats:     16 lines in 1 file changed: 0 ins; 16 del; 0 mod

8262498: More than 50% performance degradation of pow operator due to call with svml intrinsic after JDK-8261267

Reviewed-by: sviswanathan

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/42


More information about the panama-dev mailing list