RFR: 8189105: AARCH64: create intrinsic for sin and cos

Dmitrij Pochepko dmitrij.pochepko at bell-sw.com
Mon Jun 4 15:50:37 UTC 2018



On 04.06.2018 13:59, Andrew Haley wrote:
> On 06/02/2018 05:50 PM, Dmitrij Pochepko wrote:
>> (mail client sent as html, resending as plain text)
>>
>>   > On 06/01/2018 02:21 PM, Dmitrij Pochepko wrote:
>>   >> Also I launched respective JCK tests (api/java_lang/Math/sin* and
>>   >> api/java_lang/Math/cos*) in both Xmixed and Xcomp modes
>>   >>
>>   >> All passed, no failures found.
>>   >>
>>   >>
>>   >> webrev: http://cr.openjdk.java.net/~dpochepk/8189105/webrev.01/
>>   >>
>>   >> CR: https://bugs.openjdk.java.net/browse/JDK-8189105
>>   >
>>   > I don't really know how to review this.  It's a lot of hand-carved
>>   > assembly code and it's very hard to read.  It'd need to be very
>>   > thoroughly tested.
>>
>> It is a vectorized version of existing C code algorithm,
> Which existing C cpde algorithm?
this is a hotspot copy of fdlibm trigonometric functions in 
hotspot/src/share/runtime/sharedRuntimeTrig.cpp
I added respective comment in intrinsic.
>
>> so existing jtreg tests specifically written for it are best
>> suitable to test it and were found to be sufficient. During
>> development these test cases were very helpful since they cover all
>> the branches of this algorithm. All relevant JCK tests also
>> passed. The comments in this intrinsic second these in C code for
>> ease of maintenance.
>>
>>   >
>>   > I hope you don't propose to rewrite the entire runtime library by
>>   > hand.  If you do, we need to have a proper discussion here about the
>>   > risk:reward ratio.
>>
>> I agree with you and I definitely don't propose that. I think it
>> needs to be decided on a case-by-case basis (for example, we took an
>> algorithm for log() from other HotSpot ports because it turned out
>> to be simpler and faster then vectorized libm, which I also
>> implemented). For sin/cos we're seeing consistent and significant
>> improvement across all AARCH64 microarchitectures we have access
>> to. This code is also very isolated, and the risk:reward ratio here
>> IMO is worth it.
>>
>> It would be difficult to justify why AARCH64 should suffer from
>> absence of intrinsics implemented in other ports. For sin/cos we are
>> in a much better position then other ports which introduced new
>> algorithms without tests specifically crafted for it (and these
>> patches were accepted). I first started to implement sin/cos as it
>> is done for x86 but abandoned it as the code would be more difficult
>> to maintain and the improvement was not as significant.
> Indeed, so let's try to make this as good as we can.  If this is going
> to be maintained we need more documentation.  Please provide full
> pseudocode for the program; if the original C algorithm is completely
> unchanged that'll be fine, but if you have changed it in any way
> please change the C accordingly.
There were no changes in algorithm. Just vectorized version of code in 
hotspot/src/share/runtime/sharedRuntimeTrig.cpp with minor optimizations.
I created detailed comment in macroAssembler_aarch64_trig.cpp which 
explains how this optimized version is different from the original. I 
don't think it's convenient to copy all C code, because 
macroAssembler_aarch64_trig.cpp is basically optimized assembler to 
whole 890-lines-long sharedRuntimeTrig.cpp except dtan function.
>
> Also, note that on x86 everything (C1, C2, template interpreter) calls
> the intrinsic.  This is because we do not want to see any difference,
> even in a single bit, between them.  It is essential that we follow
> x86 in this regard for all intrinsics.  You can just copy the x86 code
> to do this.
>
You're right.
I created separate CR to enable all possible math intrinsics usage for 
C1 and interpreter: https://bugs.openjdk.java.net/browse/JDK-8204289, 
which is independent from sin/cos and log intrinsic code.
And I also minimally updated sin/cos patch. Respective generation call 
is moved from generate_all() to generate_initial() for interpreter to 
use initialized stub routine address.

Please take a look at webrev.02: 
http://cr.openjdk.java.net/~dpochepk/8189105/webrev.02/

Thanks,
Dmitrij
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180604/36ea1dcc/attachment-0001.html>


More information about the hotspot-compiler-dev mailing list