RFR: 8189105: AARCH64: create intrinsic for sin and cos

Wed Jun 13 17:44:30 UTC 2018

Andrew,

thank you for a detailed outline of the necessary changes.

I'll get back to you in a few days with an updated version.

Thanks,

Dmitrij

On 11.06.2018 18:07, Andrew Dinn wrote:
> Hi Dimitrij,
>
> Apologies for not responding -- I have only just returned from holiday.
>
> On 04/06/18 16:50, Dmitrij Pochepko wrote:
>> On 04.06.2018 13:59, Andrew Haley wrote:
>>>>   . . .
>>>>   > I don't really know how to review this.  It's a lot of hand-carved
>>>>   > assembly code and it's very hard to read.  It'd need to be very
>>>>   > thoroughly tested.
>>>>
>>>> It is a vectorized version of existing C code algorithm,
>>> Which existing C code algorithm?
>> this is a hotspot copy of fdlibm trigonometric functions in
>> hotspot/src/share/runtime/sharedRuntimeTrig.cpp
>> I added respective comment in intrinsic.
> Thanks for providing this code and the additional comments. However, I
> don't think this approaches anywhere near the requirements Andrew set
> out in his initial review, requirements which I agree are absolutely
> necessary. I can only repeat his recommendation that you look at the
> Montgomery multiply code in stubGenerator_aarch64.cpp to get an idea of
> the sort of detail that is needed to satisfactorily document an
> implementation this complex.
>
> The two most important goals are to be able to
>
>    1) review the current code with a high degree of confidence that it is
> correct
>
>    2) maintain this code without imposing an intolerable burden on
> developers who might need to check it months from now
>
> Review will only be possible if the code is written in a way that makes
> what it is doing clear enough to be able to judge that it is correct.
> That is far from being the case as it stands. Saying it is the same as
> the C code modulo a few optimizations is nothing like enough. Visibly
> identifying how the various register values and insns/blocks correspond
> to their original C source lines would be a big improvement. Explaining
> what these values represent as they are updated and what each insn/block
> achieves would be even better. Better still would be to define macro
> instructions in the assembler so that the generation can be viewed at
> the level of more abstract operations that reflect the numerical and
> mathematical transformations being performed. Finally, relating the
> whole to the underlying mathematical theory would be the cherry on the
> cake. Note that the original C code has comments with this level of
> detail and that code is far easier to follow than this assembler. So,
> all the more reason for the same or more commentary here.
>
> Maintenance is an issue *whether or not* the current code is correct.
> Someone will inevitably at some point be faced with the following
> dilemma: possibly, locate a bug in the current implementation; or, more
> likely, reassure yourself that the code is correct before proceeding to
> diagnose the real problem elsewhere. The current code is a fly-trap and
> a piece of software as large and mature as OpenJDK has more than enough
> complexity without adding fly-traps into the mix.
>
> Note that neither of these goals are going to be met by testing, no
> matter how exhaustive. Also, don't feel singled out by these reviews.
> Andrew and I are not suggesting this as a matter of form. We know how
> important this is because we have tested and reviewed*** an immense
> amount of much less complex code in getting this port to where it is now
> and learned a great deal from that experience. We have only been able to
> identify and fix the many bugs that have turned up /after/ extensive
> review because we tried very hard to keep the code as clear and well
> documented as possible (n.b. we also added lots of asserts to validate
> assumptions).
>
> That is not something specific that we did off our own bat. It is
> actually the expected standard for JVM code, shared or cpu-specific.
> That standard has not always been lived up to but it is steadily
> improving. The Intel versions of these intrinsics in the x86 are indeed
> a notable lapse (I believe they were culled from Intel's C compiler
> output). However, their failings do not excuse repeating the same
> mistake for AArch64.
>
> *** n.b contrary to assumptions expressed in some recent threads Andrew
> is not the only reviewer for AArch64 code. In particular, every line of
> his code has been reviewed by me and/or a handful of other reviewers
> from the day we started this port.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander