[aarch64-port-dev ] RFR: 8189107 - AARCH64: create intrinsic for pow
Dmitrij Pochepko
dmitrij.pochepko at bell-sw.com
Fri Sep 7 16:42:41 UTC 2018
Hi Andrew,
Thank you again for looking into it in such details. It will take me
some time to review your draft with comments related to original code.
Looking forward to work on improving the code and algorithm description
after that.
Small note though: since you're adding documentation to original code,
it probably would make sense to update it in original location as well
at src/hotspot/share/runtime/sharedRuntimeTrans.cpp
Thanks,
Dmitrij
On 07/09/18 15:58, Andrew Dinn wrote:
> Hi Dmitrij
>
> On 22/08/18 11:04, Andrew Dinn wrote:
>> Thank you for the revised webrev and new test results. I am now working
>> through them. I will post comments as soon as I have given the new code
>> a full read and assessed the new results. I am afraid that may take a
>> day or two, for which delay advance apologies.
> This review has taken a great deal longer than expected. I am sorry but
> that is because the documentation for the code you have submitted is
> still seriously inadequate and I have had to put a lot of work into
> revising it before I can fully review the code.
>
> I am still finishing off that last task but I wanted to start providing
> you with some feedback and also to enlist your help in checking that my
> revisions are correct. I plan to provide feedback in 3 stages to match
> the 3 steps in the review that I am doing as follows:
>
> 1) Correct the original 'algorithm' you started from
>
> 2) Correct the 'modified algorithm' that is meant to describe the
> behaviour of your code
>
> 3) Propose any necessary corrections/improvements to the generated code
>
> So, let's start with step 1.
>
> The 'original' algorithm located in file macroAssembler_aarch64_pow.cpp
> is really just a fragment of C code with a few missing elements (e.g.
> the origin of values P1, P2, ... is not explained, hugeX, tiny are not
> defined). Although this code as the virtue that it is known to be
> correct (or at least has been verified by long use and the eyes of
> experts in numerical computation) it still fails to provide important
> information about what the 'algorithm' is supposed to do. That
> information is critical for anyone coming to it fresh to be able
> understand what is happening.
>
> The first omission is several pieces of background mathematics that are
> neither explained in the code nor referenced. The mathematics includes
> the formulae on which the algorithm is based and the numerical
> approximation to these formulae that is employed to define the
> algorithm. This is needed to explain /how/ and /why/ a) the two
> different computations of log2(x) and b) the computation of exp(x) are
> performed as they are and to justify that the results are valid.
>
> The second omission is detailed descriptions of what most of the more
> complex individual steps in the algorithm do. Many of the logic,
> floating point and branching operations which compute intermediate
> results are extremely opaque. This is particularly so for the steps
> which manipulate bit patterns in the long representation of the fp
> values being used. However, some of the straight fp arithmetic is also
> highly problematic.
>
> The other thing I think needs to be made clearer is the relationship
> between the various special case return points in the code and the
> special case rules they relate to. This is not so critical for the
> original algorithm because the C code at least has a regular and
> standardised control flow. However, labelling the exit paths is still
> useful here and will be much more useful if used both here and in the
> modified algorithm (and we'll come to that later).
>
> I have rewritten the algorithm to achieve what I think is needed to
> patch these omissions. The redraft of this part of the code is available
> here:
>
> http://cr.openjdk.java.net/~adinn/8189107/original_algorithm.txt
>
> I assume you are familiar with the relevant mathematics and how it has
> been used to derive the algorithm. If so then I would like you to review
> this rewrite and ensure that there are nor mathematical errors in it. I
> would also like you to check that the explanatory comments for of the
> individual steps in the algorithm do not contain any errors.
>
> If you are not familiar with the mathematics then please let me know. I
> need to know whether this has been reviewed bu someone competent to do so.
>
> n.b. one little detail you might easily miss. I removed lg2, lg2_h and
> lg2_l from the first table of constants as neither log(x) algorithm
> needs them (it relies on ivln2). I renamed the entries in the second
> table from LG2, etc to /ln2/, etc and change the name accordingly at
> point of use. The computation of exp(x) actually does need ln2. One of
> the code changes is to remove these redundant entries from your table
> pow_coeff1.
>
> Ok, as for the next 2 steps will post a follow-up to deal with them once
> I have completed my review. That will include a heavily revised version
> of your 'modified algorithm' (which is still in progress) plus
> suggestions for changes to the code that I have found along the way.
> Just as a preliminary I'll summarize what is wrong below.
>
> Note that I have not yet found any errors in how the generated code
> implements the mathematics but I am still not happy with it because it
> is extremely unclear. Correcting the 'modified algorithm' is a
> necessary, critical step to improvimg the clarity of the code.
>
> So, in overview, what is wrong with your 'modified algorithm'. Well, the
> thing that is immediately obvious is that it is /not/ actually the
> algorithm you have employed. It is simply a mangled version of the C
> code you started from that bears only a tenuous relation to the code
> structure it is supposed to summarize. Now, I'm happy for you to use C
> to model the generated code if possible and, in fact, am in the process
> of writing a proper algorithm that looks as much like C code as possible
> /but/ also actually describes what your generated code does. The problem
> is that what you have written is not only /not/ C it is also i)
> incoherent, ii) retains elements from the original code that don't exist
> at all in the generated code and iii) omits important elements of the
> generated code.
>
> So, firstly, let's deal with the problem as it relates to control flow.
> Your 'modified algorithm' includes various tags mentioning the word
> 'label' suggesting that some transfer of control is to be effected.
> However, these are tacked onto statement blocks connected via 'if
> (cond)' tests or 'else' keywords that are meant to imply some
> alternative control flow. Essentially, your generated code relies on
> gotos which do not fit a standard if/else flow model and you have tried
> to bodge some sort of goto model on top of the original valid, gotoless
> C control flow with no clear definition of how that is meant to work.
> Honestly, if your generated code uses a goto control flow then your C
> algorithm is going to have to do the same in order to clearly summarize
> what the code actually does.
>
> The second major problem is one I pointed out in my earlier note, i.e.
> that the data values described in the 'modified algorithm' do not
> correctly match the ones operated on in your generated code. Your
> algorithm lists many redundant values used in the original algorithm
> (e.g. ix, iy, ax, yisint) even though your code doesn't ever explicitly
> construct most of those values (n.b. this but not just limited to the 32
> bit half-word quantities). Instead the code frequently pulls the
> relevant value, as needed, out of other data that it does construct and
> holds in registers -- sometimes across control branches. At other times
> it performs an equivalent operation on a different, related data value.
> Your response to my request was to add comments which labelled some of
> these on-the-fly created values or alternative values with the original
> names but that ignores the fact that the names and the values referenced
> in the comments do not actually match.
>
> Contrariwise, a lot of the values the code does actually operate on are
> not mentioned in the algorithm. Indeed, it is worse than that because
> they are not coherently identified even in the generated code. Data
> items stored in registers are referred to using the utterly redundant
> symbolic aliases tmp0, tmp2, etc for registers r0, r1 etc. What is worse
> the same meaningless symbolic names get reused for completely different
> data items.
>
> For example, at one point tmp2 identifies the exponent of y stored in r2
> and later it identifies the absolute value of y also stored in r2,
> overwriting the exponent. Your algorithm really ought to mention values
> like exp_y or ay (or even |y|) for these cases and the code should
> correspondingly define exp_y and ay as an alias for register r2. These
> meaningful names should then be used when loading the constructed value
> into a register and at every subsequent point of use where that
> constructed value is valid.
>
> This is not all that is wrong with the 'modified algorithm' but it is
> enough to make it not just useless but worse than useless. What you have
> written provides a hand-wave towards what the code does that fails to
> summarize it with any accuracy or clarity and equally fails to clarify
> the difference between it and the C code you started from. That only
> makes the whole picture less clear not more so.
>
> As I said, I will provide a better version of the 'modified algorithm'
> in a follow-up and then discuss possible code changes. Please review the
> linked file above while I prepare that.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
More information about the aarch64-port-dev
mailing list