[aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp)
Andrew Haley
aph at redhat.com
Tue Aug 25 16:55:38 UTC 2020
On 24/08/2020 22:52, Dmitry Chuyko wrote:
>
> I added two more intrinsics -- for copySign, they are controlled by
> UseCopySignIntrinsic flag.
>
> webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/
>
> It also contains 'benchmarks' directory:
> http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/benchmarks/
>
> There are 8 benchmarks there: (double | float) x (blackhole | reduce) x
> (current j.l.Math.signum | abs()>0 check).
>
> My results on Arm are in signum-facgt-copysign.ods. Main case is
> 'random' which is actually a random from positive and negative numbers
> between -0.5 and +0.5.
>
> Basically we have ~14% improvement in 'reduce' benchmark variant but
> ~20% regression in 'blackhole' variant in case of only copySign()
> intrinsified.
>
> Same picture if abs()>0 check is used in signum() (+-5%). This variant
> is included as it shows very good results on x86.
>
> Intrinsic for signum() gives improvement of main case in both
> 'blackhole' and 'reduce' variants of benchmark: 28% and 11%, which is a
> noticeable difference.
Ignoring Blackhole for the moment, this is what I'm seeing for the
reduction/random case:
Benchmark Mode Cnt Score Error Units
ThunderX 2:
-XX:-UseSignumIntrinsic -XX:-UseCopySignIntrinsic
DoubleReduceBench.ofRandom avgt 3 2.456 ± 0.065 ns/op
-XX:+UseSignumIntrinsic -XX:-UseCopySignIntrinsic
DoubleReduceBench.ofRandom avgt 3 2.766 ± 0.107 ns/op
-XX:-UseSignumIntrinsic -XX:+UseCopySignIntrinsic
DoubleReduceBench.ofRandom avgt 3 2.537 ± 0.770 ns/op
Neoverse N1 (Actually Amazon m6g.16xlarge):
-XX:-UseSignumIntrinsic -XX:-UseCopySignIntrinsic
DoubleReduceBench.ofRandom avgt 3 1.173 ± 0.001 ns/op
-XX:+UseSignumIntrinsic -XX:-UseCopySignIntrinsic
DoubleReduceBench.ofRandom avgt 3 1.043 ± 0.022 ns/op
-XX:-UseSignumIntrinsic -XX:+UseCopySignIntrinsic
DoubleReduceBench.ofRandom avgt 3 1.012 ± 0.001 ns/op
By your own numbers, in the reduce benchmark the signum intrinsic is
worse than default for all 0 and NaN, but about 12% better for random,
>0, and <0. If you take the average of the sppedups and slowdowns it's
actually worse than default.
By my reckoning, if you take all possibilities (Nan, <0, >0, 0,
Random) into account, the best-performing on the reduce test is
actually Abs/Copysign, but there's very little in it. The only time
that the signum intrinsic actually wins is when you're storing the
result into memory *and* flushing the store buffer.
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the aarch64-port-dev
mailing list