[vectorIntrinsics] RFR: 8262498: More than 50% performance degradation of pow operator due to call with svml intrinsic after JDK-8261267 [v3]

Sandhya Viswanathan sviswanathan at openjdk.java.net
Thu Mar 4 18:58:54 UTC 2021


On Thu, 4 Mar 2021 07:26:15 GMT, Jie Fu <jiefu at openjdk.org> wrote:

>> Hi all,
>> 
>> Performance of Vector API's pow operator has been decreased by more than 50% for micro benchmarks like:
>>   Double128Vector.POW
>>   Double256Vector.POW
>>   DoubleMaxVector.POW
>>   DoubleScalar.POW
>>   Float128Vector.POW
>>   Float256Vector.POW
>>   FloatMaxVector.POW
>> 
>> Experiments show that svml's pow intrinsics are slow (except for the 512-bit ones).
>> So only 512-bit vectors are allowed to be intrinsified with svml and others should be disabled.
>> 
>> Here is the effect of this fix.
>>                                            Before                    |                                              After
>> ------------------------------------------------------------------------------------------------------------------------------------------
>> Benchmark            (size)   Mode  Cnt    Score     Error   Units   |   Benchmark            (size)   Mode  Cnt    Score    Error   Units
>> Double128Vector.POW    1024  thrpt    5   14.895 ?   0.070  ops/ms   |   Double128Vector.POW    1024  thrpt    5   31.897 ?  0.203  ops/ms
>> Double256Vector.POW    1024  thrpt    5   15.650 ?   1.274  ops/ms   |   Double256Vector.POW    1024  thrpt    5   36.690 ?  2.848  ops/ms
>> Double512Vector.POW    1024  thrpt    5  263.472 ?   0.062  ops/ms   |   Double512Vector.POW    1024  thrpt    5  261.681 ? 13.817  ops/ms
>> Double64Vector.POW     1024  thrpt    5   17.881 ?   0.244  ops/ms   |   Double64Vector.POW     1024  thrpt    5   17.734 ?  0.184  ops/ms
>> DoubleMaxVector.POW    1024  thrpt    5  263.613 ?   0.132  ops/ms   |   DoubleMaxVector.POW    1024  thrpt    5  263.085 ?  0.167  ops/ms
>> DoubleScalar.POW       1024  thrpt    5   45.268 ?   0.043  ops/ms   |   DoubleScalar.POW       1024  thrpt    5   45.220 ?  0.013  ops/ms
>> Float128Vector.POW     1024  thrpt    5   13.761 ?   0.092  ops/ms   |   Float128Vector.POW     1024  thrpt    5   28.578 ?  0.213  ops/ms
>> Float256Vector.POW     1024  thrpt    5   13.131 ?   0.101  ops/ms   |   Float256Vector.POW     1024  thrpt    5   29.414 ?  0.370  ops/ms
>> Float512Vector.POW     1024  thrpt    5  624.449 ? 267.160  ops/ms   |   Float512Vector.POW     1024  thrpt    5  649.519 ?  2.295  ops/ms
>> Float64Vector.POW      1024  thrpt    5   10.888 ?   0.069  ops/ms   |   Float64Vector.POW      1024  thrpt    5   26.376 ?  0.601  ops/ms
>> FloatMaxVector.POW     1024  thrpt    5  658.723 ?   2.445  ops/ms   |   FloatMaxVector.POW     1024  thrpt    5  663.723 ?  2.852  ops/ms
>> FloatScalar.POW        1024  thrpt    5   30.682 ?   0.095  ops/ms   |   FloatScalar.POW        1024  thrpt    5   30.678 ?  0.074  ops/ms
>> 
>> Thanks.
>> Best regards,
>> Jie
>
> Jie Fu has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Fix in stubGenerator_x86_64.cpp
>  - Revert change

Marked as reviewed by sviswanathan (Committer).

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/42


More information about the panama-dev mailing list