[10] RFR: 8186915 - AARCH64: Intrinsify squareToLen and mulAdd

Fri Sep 22 08:12:23 UTC 2017

On 21/09/17 19:19, Dmitrij Pochepko wrote:

> thank you for looking into this and trying on APM(I have no access to 
> this h/w).
> 
> 
> I've used modified benchmark you've sent and run it on ThunderX and 
> implSquareToLen still shows better results than implMultiplyToLen in 
> most cases on ThunderX (up to 10% on size=127. results: 
> http://cr.openjdk.java.net/~dpochepk/8186915/ThunderX_new.txt).

For 10%, it's not worth doing, given the risks and that it's not used
by crypto operations when C2-compiled.

> However, since performance difference for APM is more than on
> ThunderX, I think it'll be more logical to return back to your idea
> and call multiplyToLen intrinsic inside squareToLen. Alternative
> solution is to generate different code for APM and ThunderX, but I
> prefer to have single version in case of such relatively small
> difference in performance and it's still much faster than without
> intrinsic at all.  What do you think?

Yes.  Calling multiplyToLen would be fine.

> fyi: regarding size 200 and 1000 - it's incorrect to measure these
> sizes for squareToLen, because squareToLen is never called for size
> more than 127(I've mentioned it before).

It's not incorrect: it's a test for asymptotic behaviour.
-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671