RFR: 8189105: AARCH64: create intrinsic for sin and cos
Dmitrij Pochepko
dmitrij.pochepko at bell-sw.com
Mon Jun 4 15:50:37 UTC 2018
On 04.06.2018 13:59, Andrew Haley wrote:
> On 06/02/2018 05:50 PM, Dmitrij Pochepko wrote:
>> (mail client sent as html, resending as plain text)
>>
>> > On 06/01/2018 02:21 PM, Dmitrij Pochepko wrote:
>> >> Also I launched respective JCK tests (api/java_lang/Math/sin* and
>> >> api/java_lang/Math/cos*) in both Xmixed and Xcomp modes
>> >>
>> >> All passed, no failures found.
>> >>
>> >>
>> >> webrev: http://cr.openjdk.java.net/~dpochepk/8189105/webrev.01/
>> >>
>> >> CR: https://bugs.openjdk.java.net/browse/JDK-8189105
>> >
>> > I don't really know how to review this. It's a lot of hand-carved
>> > assembly code and it's very hard to read. It'd need to be very
>> > thoroughly tested.
>>
>> It is a vectorized version of existing C code algorithm,
> Which existing C cpde algorithm?
this is a hotspot copy of fdlibm trigonometric functions in
hotspot/src/share/runtime/sharedRuntimeTrig.cpp
I added respective comment in intrinsic.
>
>> so existing jtreg tests specifically written for it are best
>> suitable to test it and were found to be sufficient. During
>> development these test cases were very helpful since they cover all
>> the branches of this algorithm. All relevant JCK tests also
>> passed. The comments in this intrinsic second these in C code for
>> ease of maintenance.
>>
>> >
>> > I hope you don't propose to rewrite the entire runtime library by
>> > hand. If you do, we need to have a proper discussion here about the
>> > risk:reward ratio.
>>
>> I agree with you and I definitely don't propose that. I think it
>> needs to be decided on a case-by-case basis (for example, we took an
>> algorithm for log() from other HotSpot ports because it turned out
>> to be simpler and faster then vectorized libm, which I also
>> implemented). For sin/cos we're seeing consistent and significant
>> improvement across all AARCH64 microarchitectures we have access
>> to. This code is also very isolated, and the risk:reward ratio here
>> IMO is worth it.
>>
>> It would be difficult to justify why AARCH64 should suffer from
>> absence of intrinsics implemented in other ports. For sin/cos we are
>> in a much better position then other ports which introduced new
>> algorithms without tests specifically crafted for it (and these
>> patches were accepted). I first started to implement sin/cos as it
>> is done for x86 but abandoned it as the code would be more difficult
>> to maintain and the improvement was not as significant.
> Indeed, so let's try to make this as good as we can. If this is going
> to be maintained we need more documentation. Please provide full
> pseudocode for the program; if the original C algorithm is completely
> unchanged that'll be fine, but if you have changed it in any way
> please change the C accordingly.
There were no changes in algorithm. Just vectorized version of code in
hotspot/src/share/runtime/sharedRuntimeTrig.cpp with minor optimizations.
I created detailed comment in macroAssembler_aarch64_trig.cpp which
explains how this optimized version is different from the original. I
don't think it's convenient to copy all C code, because
macroAssembler_aarch64_trig.cpp is basically optimized assembler to
whole 890-lines-long sharedRuntimeTrig.cpp except dtan function.
>
> Also, note that on x86 everything (C1, C2, template interpreter) calls
> the intrinsic. This is because we do not want to see any difference,
> even in a single bit, between them. It is essential that we follow
> x86 in this regard for all intrinsics. You can just copy the x86 code
> to do this.
>
You're right.
I created separate CR to enable all possible math intrinsics usage for
C1 and interpreter: https://bugs.openjdk.java.net/browse/JDK-8204289,
which is independent from sin/cos and log intrinsic code.
And I also minimally updated sin/cos patch. Respective generation call
is moved from generate_all() to generate_initial() for interpreter to
use initialized stub routine address.
Please take a look at webrev.02:
http://cr.openjdk.java.net/~dpochepk/8189105/webrev.02/
Thanks,
Dmitrij
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180604/36ea1dcc/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list