RFR(S) 8029302: Performance regression in Math.pow intrinsic
Vladimir Kozlov
vladimir.kozlov at oracle.com
Wed Apr 23 19:08:06 UTC 2014
On 4/23/14 4:06 AM, Niclas Adlertz wrote:
> Hi Vladimir,
>
> > x**0 = 1
> > x**1 = x
> > x**-1 = 1/x
> > x**0.5 = sqrt(x)
> I will file a new bug for these simple cases as well.
Agree.
>
> > Also there is check for NaN before all this cases except x**0 = 1:
> >
> > /* +-NaN return x+y */
> This case returns NaN if either x (when y != 0) or y is NaN?
> If so, yes, we should handle this simple case as well.
> (Why do we return x+y? Would not
> if (x != x) return x;
> if (y != y) return y;
cmp+branch are a lot more expensive than add
> be clearer and faster since we can skip the addition?)
>
> The only case of NaN I will catch when y == 2 is NaN**2. And that should
> result in NaN since we will do x * x (NaN * NaN).
Okay.
Thanks,
Vladimir
>
> Kind Regards,
> Niclas Adlertz
>
> On 04/17/2014 03:45 PM, Vladimir Kozlov wrote:
>> Niclas,
>>
>> Looking on __ieee754_pow() in sharedRuntimeTrans.cpp and it has other
>> simple cases:
>>
>> x**0 = 1
>> x**1 = x
>> x**-1 = 1/x
>> x**0.5 = sqrt(x)
>>
>> It would be nice to know which are frequently used and implement them
>> too.
>>
>> Also there is check for NaN before all this cases except x**0 = 1:
>>
>> /* +-NaN return x+y */
>>
>> You need to test that new C2 code produces the same results for NaN
>> values.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/17/14 3:10 AM, Niclas Adlertz wrote:
>>> Hi all,
>>>
>>> webrev: http://cr.openjdk.java.net/~adlertz/JDK-8029302/webrev00/
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8029302
>>>
>>> We have a performance regression in Math.pow(x,2) on x64, starting
>>> from 7u40.
>>> In 7u40 we replaced a call to SharedRuntime::dpow with an intrinsic
>>> for Math.pow. This is faster in almost all cases,
>>> except for Math.pow(x,2). (See comments in bug report for more info.)
>>>
>>> I have added a C2 IR check for Math.pow(x,y) when y == 2, and instead
>>> of calling SharedRuntime::dpow when y == 2, I
>>> directly do x * x.
>>>
>>> I've changed the generated C2 IR,
>>>
>>> From (psuedo code):
>>>
>>> if (x <= 0.0) {
>>> long longy = (long)y;
>>> if ((double)longy == y) { // if y is long
>>> if (y + 1 == y) longy = 0; // huge number: even
>>> result = ((1&longy) == 0)?-DPow(abs(x), y):DPow(abs(x), y);
>>> } else {
>>> result = NaN;
>>> }
>>> } else {
>>> result = DPow(x,y);
>>> }
>>> if (result != result)? {
>>> result = uncommon_trap() or runtime_call();
>>> }
>>> return result;
>>>
>>> To (psuedo code):
>>>
>>> if (y == 2) {
>>> return x * x;
>>> } else {
>>> if (x <= 0.0) {
>>> long longy = (long)y;
>>> if ((double)longy == y) { // if y is long
>>> if (y + 1 == y) longy = 0; // huge number: even
>>> result = ((1&longy) == 0)?-DPow(abs(x), y):DPow(abs(x), y);
>>> } else {
>>> result = NaN;
>>> }
>>> } else {
>>> result = DPow(x,y);
>>> }
>>> if (result != result)? {
>>> result = uncommon_trap() or runtime_call();
>>> }
>>> return result;
>>> }
>>>
>>> I have run jtreg tests in jdk/tests/java/lang (with -server, -Xcomp
>>> and -XX:-TieredCompilation) and run JPRT. No
>>> problems encountered.
>>> In particular, java/lang/Math/PowTests passes.
>>>
>>> I re-wrote the performance test included in the bug report
>>> (https://bugs.openjdk.java.net/secure/attachment/17807/Main.java)
>>> to a JMH test;
>>> http://cr.openjdk.java.net/~adlertz/JDK-8029302/webrev00/MyBenchmark.java
>>>
>>>
>>> Below are the performance results. The x^2 case is now much faster
>>> even compared to 7u25. (Since we now skip the call to
>>> SharedRuntime::dpow)
>>>
>>> Numbers from 7u25 b34:
>>> Iteration 1: 46764.923 ops/ms
>>> Iteration 2: 46695.196 ops/ms
>>> Iteration 3: 46647.386 ops/ms
>>> Iteration 4: 46806.854 ops/ms
>>> Iteration 5: 46787.259 ops/ms
>>> Iteration 6: 46788.196 ops/ms
>>> Iteration 7: 46797.500 ops/ms
>>> Iteration 8: 46784.237 ops/ms
>>> Iteration 9: 46782.717 ops/ms
>>> Iteration 10: 46790.678 ops/ms
>>> Iteration 11: 46785.139 ops/ms
>>> Iteration 12: 46798.346 ops/ms
>>> Iteration 13: 46784.595 ops/ms
>>> Iteration 14: 46770.963 ops/ms
>>> Iteration 15: 46789.574 ops/ms
>>> Iteration 16: 46822.452 ops/ms
>>> Iteration 17: 46813.571 ops/ms
>>> Iteration 18: 46747.076 ops/ms
>>> Iteration 19: 46774.254 ops/ms
>>> Iteration 20: 46779.329 ops/ms
>>>
>>> Result : 46775.512 ±(99.9%) 34.788 ops/ms
>>> Statistics: (min, avg, max) = (46647.386, 46775.512, 46822.452),
>>> stdev = 40.061
>>> Confidence interval (99.9%): [46740.725, 46810.300]
>>>
>>>
>>> Numbers from 7u40 b34:
>>> Iteration 1: 9966.052 ops/ms
>>> Iteration 2: 9967.683 ops/ms
>>> Iteration 3: 9967.229 ops/ms
>>> Iteration 4: 9967.266 ops/ms
>>> Iteration 5: 9937.091 ops/ms
>>> Iteration 6: 9966.272 ops/ms
>>> Iteration 7: 9964.679 ops/ms
>>> Iteration 8: 9966.326 ops/ms
>>> Iteration 9: 9964.899 ops/ms
>>> Iteration 10: 9966.920 ops/ms
>>> Iteration 11: 9963.278 ops/ms
>>> Iteration 12: 9967.334 ops/ms
>>> Iteration 13: 9963.351 ops/ms
>>> Iteration 14: 9968.032 ops/ms
>>> Iteration 15: 9964.312 ops/ms
>>> Iteration 16: 9967.080 ops/ms
>>> Iteration 17: 9965.114 ops/ms
>>> Iteration 18: 9966.860 ops/ms
>>> Iteration 19: 9965.375 ops/ms
>>> Iteration 20: 9966.215 ops/ms
>>>
>>> Result : 9964.568 ±(99.9%) 5.743 ops/ms
>>> Statistics: (min, avg, max) = (9937.091, 9964.568, 9968.032), stdev
>>> = 6.613
>>> Confidence interval (99.9%): [9958.826, 9970.311]
>>>
>>>
>>> Numbers from http://hg.openjdk.java.net/jdk9/hs-comp/hotspot without
>>> the y == 2 check:
>>> Iteration 1: 9966.775 ops/ms
>>> Iteration 2: 9964.514 ops/ms
>>> Iteration 3: 9959.708 ops/ms
>>> Iteration 4: 9965.501 ops/ms
>>> Iteration 5: 9958.087 ops/ms
>>> Iteration 6: 9964.471 ops/ms
>>> Iteration 7: 9964.966 ops/ms
>>> Iteration 8: 9965.132 ops/ms
>>> Iteration 9: 9959.055 ops/ms
>>> Iteration 10: 9964.666 ops/ms
>>> Iteration 11: 9965.649 ops/ms
>>> Iteration 12: 9964.309 ops/ms
>>> Iteration 13: 9966.963 ops/ms
>>> Iteration 14: 9956.511 ops/ms
>>> Iteration 15: 9964.881 ops/ms
>>> Iteration 16: 9966.927 ops/ms
>>> Iteration 17: 9951.054 ops/ms
>>> Iteration 18: 9966.512 ops/ms
>>> Iteration 19: 9967.041 ops/ms
>>> Iteration 20: 9967.198 ops/ms
>>>
>>> Result : 9963.496 ±(99.9%) 3.760 ops/ms
>>> Statistics: (min, avg, max) = (9951.054, 9963.496, 9967.198), stdev
>>> = 4.330
>>> Confidence interval (99.9%): [9959.736, 9967.256]
>>>
>>>
>>> Numbers from http://hg.openjdk.java.net/jdk9/hs-comp/hotspot with the
>>> y == 2 check:
>>> Iteration 1: 276969.757 ops/ms
>>> Iteration 2: 276809.529 ops/ms
>>> Iteration 3: 276621.258 ops/ms
>>> Iteration 4: 276352.094 ops/ms
>>> Iteration 5: 276922.865 ops/ms
>>> Iteration 6: 276617.189 ops/ms
>>> Iteration 7: 276941.087 ops/ms
>>> Iteration 8: 276215.547 ops/ms
>>> Iteration 9: 276118.685 ops/ms
>>> Iteration 10: 276550.807 ops/ms
>>> Iteration 11: 276773.424 ops/ms
>>> Iteration 12: 276871.125 ops/ms
>>> Iteration 13: 276059.947 ops/ms
>>> Iteration 14: 277109.329 ops/ms
>>> Iteration 15: 276910.165 ops/ms
>>> Iteration 16: 276138.922 ops/ms
>>> Iteration 17: 276083.749 ops/ms
>>> Iteration 18: 276367.479 ops/ms
>>> Iteration 19: 276563.471 ops/ms
>>> Iteration 20: 276022.425 ops/ms
>>>
>>> Result : 276550.943 ±(99.9%) 309.657 ops/ms
>>> Statistics: (min, avg, max) = (276022.425, 276550.943, 277109.329),
>>> stdev = 356.601
>>> Confidence interval (99.9%): [276241.286, 276860.600]
>>>
More information about the hotspot-compiler-dev
mailing list