[aarch64-port-dev ] RFR: 8189107 - AARCH64: create intrinsic for pow

Fri Aug 24 08:31:34 UTC 2018

On 08/23/2018 01:31 PM, Dmitrij Pochepko wrote:
> 
> 
> On 22/08/18 16:43, Andrew Haley wrote:
>> On 08/22/2018 11:04 AM, Andrew Dinn wrote:
>>> Thank you for the revised webrev and new test results. I am now working
>>> through them.
>> I wonder about the validity of
>>
>>       L1X+ x *(L2X+ x *(L3X+  x   *  (L4X+ x *(L5X+ x *L6X)))) is calculated as:
>>
>>       L1X+ x *(L2X+ x *L3X)+  x^3 *  (L4X+ x *(L5X+ x *L6X)),
>>
>> where L1X+ x *(L2X+ x *L3X)
>>        L4X+ x *(L5X+ x *L6X) are calculated simultaneously in vector (fmlavs)
>>
>>        (On the range [0,0.1716])
>>
>>
>> This transformation looks like a variant of Estrin's scheme, but it's
>> not quite the same.  I can see no convincing reason why it should be
>> invalid, but its rounding and underflow behaviour will be different
>> from Horner's scheme.  Having said that, the use of fmla should mean
>> that the error is less than the original code, which didn't use fused
>> multiply-add at all.
>>
> well, I suppose the most questionable range is where X is near 0 (it's 
> when input X argument is near 1.0).
> I created separate brute force test (run in Xcomp), which compares 
> Math.pow with StrictMath.pow using all representable double values 
> within given range and found no differences.
> I used input argument range 0.9999...1.0001 (so that X values in this 
> polynomial are in [0, 0.000049998]. Input argument range has 
> 1.351079888×10¹² double values and for all these values results were 
> correct.

Sure, it's probably fine, but that's not really an error analysis.

I'm curious, though: why did you not use a second-order variant of
Horner's scheme, with one limb calculating the odd powers and the
other the even powers, combining them with a final fused multiply-add?
It would be more conventional, and you'd be using multiply-add at
every stage, minimizing rounding errors.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671