[10] RFR (S): C2: missing strength reduction of Math.pow(x, 2.0D) to x*x

Wed Dec 13 14:58:50 UTC 2017

Thanks for review, Vladimir.

> I agree with suggested fix. What check we have before 9 and where was it?

It was quite complicated. There were LibraryCallKit::inline_pow() [1] & 
LibraryCallKit::finish_pow_exp() [2] with a runtime check for 2.0 
exponent, multiple PowDs on fast paths, and a call into 
SharedRuntime::dpow on slow path. PowD had matcher rules [3] which 
produced x87-based implementation.

> Why do you need to move value to tmp1?
> 
> +  movdq(tmp1, xmm1);
> +  cmp64(tmp1, ExternalAddress(DOUBLE2));

We want to compare 64-bit values and move the exponent from XMM to GP 
register.

void Assembler::movdq(Register dst, XMMRegister src) {
void MacroAssembler::cmp64(Register src1, AddressLiteral src2) {

Best regards,
Vladimir Ivanov

[1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.72
[2] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.16
[3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l13.7

> On 12/12/17 11:16 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8190869/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8190869
>>
>> Math.pow() was completely rewritten on 9 and important special case
>> (exponent == 2.0) was missed. It caused severe performance regression
>> (up to 10x) on some numeric code.
>>
>> The fix is two-fold:
>>    * add special case in C2 to transform pow(x,2.0) into x*x;
>>    * for consistency of results between different execution modes
>> (interpreter, C1, and C2), add special case into generated stub at
>> MacroAssembler::fast_pow().
>>
>> Some clarifications follow.
>>
>> Testing:
>>    * microbenchmark
>>    * hs-precheckin-comp, hs-tier1, hs-tier2 (in progress)
>>    * test case from 8189172 on consistency of results
>>
>> The fix doesn't cover all possible cases (e.g., when exponent is not a
>> constant during parsing, but becomes such later). I experimented with
>> more elaborate fix, but wasn't satisfied with the results and chose
>> simpler route.
>>
>> I see 2 alternatives:
>>
>> (1) Introduce macro node PowD and either transform it into MulD if
>> exponent == 2.0D or expand it into CallLeafNode. The downside is that
>> PowD should be CallNode and replacing it with a MulD requires some IR
>> tweaking around it.
>>
>> (2) Introduce a runtime check before calling the stub (prototype [1]).
>> If exponent becomes 2.0 later, the check is folded and stub call is
>> removed. The downside is that when exponent doesn't turn into 2.0, the
>> check stays and duplicates the check in the stub. Unfortunately, there's
>> no way to remove it from the stub, because interpreter & C1 relies on
>> it. Otherwise, inconsistent computation results become possible (like
>> JDK-8189172 [2]).
>>
>> So, I decided to go with a simple fix and cover most common case
>> (explicit usage of 2.0 with Math.pow) for now.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] http://cr.openjdk.java.net/~vlivanov/8190869/webrev.runtime_check
>> [2] https://bugs.openjdk.java.net/browse/JDK-8189172