[aarch64-port-dev ] RFR: 8189107 - AARCH64: create intrinsic for pow
Andrew Haley
aph at redhat.com
Fri Aug 24 08:31:34 UTC 2018
On 08/23/2018 01:31 PM, Dmitrij Pochepko wrote:
>
>
> On 22/08/18 16:43, Andrew Haley wrote:
>> On 08/22/2018 11:04 AM, Andrew Dinn wrote:
>>> Thank you for the revised webrev and new test results. I am now working
>>> through them.
>> I wonder about the validity of
>>
>> L1X+ x *(L2X+ x *(L3X+ x * (L4X+ x *(L5X+ x *L6X)))) is calculated as:
>>
>> L1X+ x *(L2X+ x *L3X)+ x^3 * (L4X+ x *(L5X+ x *L6X)),
>>
>> where L1X+ x *(L2X+ x *L3X)
>> L4X+ x *(L5X+ x *L6X) are calculated simultaneously in vector (fmlavs)
>>
>> (On the range [0,0.1716])
>>
>>
>> This transformation looks like a variant of Estrin's scheme, but it's
>> not quite the same. I can see no convincing reason why it should be
>> invalid, but its rounding and underflow behaviour will be different
>> from Horner's scheme. Having said that, the use of fmla should mean
>> that the error is less than the original code, which didn't use fused
>> multiply-add at all.
>>
> well, I suppose the most questionable range is where X is near 0 (it's
> when input X argument is near 1.0).
> I created separate brute force test (run in Xcomp), which compares
> Math.pow with StrictMath.pow using all representable double values
> within given range and found no differences.
> I used input argument range 0.9999...1.0001 (so that X values in this
> polynomial are in [0, 0.000049998]. Input argument range has
> 1.351079888×10¹² double values and for all these values results were
> correct.
Sure, it's probably fine, but that's not really an error analysis.
I'm curious, though: why did you not use a second-order variant of
Horner's scheme, with one limb calculating the odd powers and the
other the even powers, combining them with a final fused multiply-add?
It would be more conventional, and you'd be using multiply-add at
every stage, minimizing rounding errors.
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the hotspot-compiler-dev
mailing list