[aarch64-port-dev ] RFR: 8189107 - AARCH64: create intrinsic for pow

Fri Sep 21 08:55:32 UTC 2018

On 20/09/18 18:27, Dmitrij Pochepko wrote:
> On 14/09/18 18:29, Andrew Dinn wrote:
>  Yes, I do have relevant background in mathematics. And yes to the
> questions below but for the latest. That said, it's always good to have
> another pair of eyes looking at the review. To be honest, I had to
> refresh my memory regarding Remez polynomials.

Thank you for the confirmation.

   . . .

>> Can you provide a mathematical proof that the variations you have
>> introduced into the computational process (specifially the move from
>> Horner form to Estrin form) will not introduce rounding errors?
> I have formal verification for some arguments ranges that I considered
> the most problematic, but the complete proof is too complicated. Looking
> at the situation from reviewer side I understand why it'll be safer and
> easier to maintain to have assembly version duplicate the original
> fdlibm code and because of that I suggest to revert questionable places
> to original schemas as the performance improvement is not that big.

Ok, use of Horner form was one of the things I was going to ask you to
restore. I did actually ask Joe Darcy about the use of Estrin form. If
he can provide an argument that it is ok to employ it then we can think
about reinstating the vector computation as an upgrade at a later date.
I am not surprised its removal makes only a small difference, given how
little of the computation it represents.

> new webrev with polynomial calculations changed back to original schema.
> Also changed scalbn implementation to be the same as original:
> http://cr.openjdk.java.net/~dpochepk/8189107/webrev.03/
So, I guess that means you have now actually tested the underflow case?
The previous scalbn implementation was one of two places in which the
code was seriously broken. In consequence, all computations meant to
generate underflowing results were not computed correctly.

One of the things wrong in the previous scalbn implementation was the
use of cmp rather than cmpw, an error which I see you have now fixed
(there were 3 further mistakes in this section of the code). Your
rewrite of the case handling looks complex and unnecessarily slow to me.
I'll suggest a better fix in a review I am currently writing up.

The other place where there was an error was in the Y_IS_HUGE branch.
The vector maths code expects to load the first 4 table values in a
permuted order (i.e. it assumes they are 0.25, -1/ln2, -0.333333, 0.5)
but the corresponding constants in _pow_coef1 have not been permuted.
Because of this all computations where 2^31 < y < 2^64 were not computed
correctly. This error appears still to be present in the latest version
of the code. So, I assume you have also never tested that range of values?

For example, try x = 1.0000000001, y = (double)0x1_1000_0000L and
compare the result with that obtained from the StrictMath routine.

I did explicitly ask you to ensure that all paths through the code were
tested in an earlier posting. Once I had read through and understood the
code it took me about 2 minutes to produce a test program that exercised
each of these 2 broken paths (and about 1/2 hour in gdb to detect and
fix each problem). I'm very under-impressed that you did not bother to
produce such tests as part of your test regime.

These errors are not bizarre or unexpected corner cases. They are paths
that can be expected to be taken during normal computations. Testing the
code requires at the very least driving the code through such paths with
suitable inputs and then ensuring the results are valid so you should
really have looked for and found these bugs.

Of course, testing also requires identifying and checking bizarre or
unexpected corner cases. However, while it might be understandable if
some of those latter cases were missed there is really no excuse for not
checking the known, expected paths with at least some inputs. It's
extremely unhelpful to those who have to maintain this code when a
contributor takes the job of testing this lightly. Nor is such behaviour
going to encourage anyone to accept further contributions.

You can expect a full review of the code some time today (it will be
based on the previous version so you will have to make allowance for the
changes you have made in the latest version). There are only a few
things I would like you to tweak in the sequence of generated
instructions. However, you will need to do a lot of rework to make the
generator code more systematic and more readable. That includes 1)
introducing some block structure and local declarations to the generator
code and 2) adopting a more coherent allocation of values to registers
which reflects the naming and local usage of variables in the original
algorithm. To simplify that task I will provide a revised algorithm that
which faithfully reflects the structure of your generated code and
clarifies its relation to the original.

regards,

Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander