[aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp)
Dmitry Chuyko
dmitry.chuyko at bell-sw.com
Tue Aug 18 15:05:01 UTC 2020
Hi Andrew,
Thanks for taking a look.
This work has started as a try to improve common code, see JDK-8249198
[1] and short related discussion [2]. And the original benchmark [3] is
quite similar to the one that you used.
As you kindly tried the patch on a hardware where it shows degradation
(baseline is quite slow btw), I think it makes sense to limit it to
Cortex/Neoverse. So I restored UseSignumInrinsic flag which is enabled
only for CPU_ARM. Disabling InlineMathNatives also disables it.
webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.02/
As suggested by Anrew Dinn, there are few more test cases in the test:
+-MIN_NORMAL and some denormal numbers.
Some more results for a benchmark with reduce():
-XX:-UseSignumIntrinsic
DoubleOrigSignum.ofMostlyNaN 0.914 ± 0.001 ns/op
DoubleOrigSignum.ofMostlyNeg 1.178 ± 0.001 ns/op
DoubleOrigSignum.ofMostlyPos 1.176 ± 0.017 ns/op
DoubleOrigSignum.ofMostlyZero 0.803 ± 0.001 ns/op
DoubleOrigSignum.ofRandom 1.175 ± 0.012 ns/op
-XX:+UseSignumIntrinsic
DoubleOrigSignum.ofMostlyNaN 1.040 ± 0.007 ns/op
DoubleOrigSignum.ofMostlyNeg 1.040 ± 0.004 ns/op
DoubleOrigSignum.ofMostlyPos 1.039 ± 0.003 ns/op
DoubleOrigSignum.ofMostlyZero 1.040 ± 0.001 ns/op
DoubleOrigSignum.ofRandom 1.040 ± 0.003 ns/op
If we only intrinsify copySign() we lose free mask that we get from
facgt. In such case improvement (for signum) decreases like from ~30% to
~15%, and it also greatly depends on the particular HW. We can
additionally introduce an intrinsic for Math.copySign(), especially it
makes sense for float where it can be just 2 fp instructions: movi+bsl
(fmovd+fnegd+bsl for double).
-Dmitry
[1] https://bugs.openjdk.java.net/browse/JDK-8249198
[2]
https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/067666.html
[3]
http://cr.openjdk.java.net/~dchuyko/8249198/webrev.00/raw_files/new/test/micro/org/openjdk/bench/java/lang/DoubleSignum.java
On 8/15/20 4:50 PM, Andrew Haley wrote:
> I've been looking at the way Math.signum() is used, mostly by
> searching the GitHub code database. I've changed the JMH test to be
> IMO more realistic: it's at
> http://cr.openjdk.java.net/~aph/DoubleSignum.java. I think it's more
> realitic because signum() results usually aren't stored but are used
> to feed other arithmetic ops, usually + or *.
>
> Baseline:
>
> Benchmark Mode Cnt Score Error Units
> DoubleSignum.ofMostlyNaN avgt 3 2.409 ± 0.051 ns/op
> DoubleSignum.ofMostlyNeg avgt 3 2.475 ± 0.211 ns/op
> DoubleSignum.ofMostlyPos avgt 3 2.494 ± 0.015 ns/op
> DoubleSignum.ofMostlyZero avgt 3 2.501 ± 0.008 ns/op
> DoubleSignum.ofRandom avgt 3 2.458 ± 0.373 ns/op
> DoubleSignum.overhead avgt 3 2.373 ± 0.029 ns/op
>
> -XX:+UseSignumIntrinsic:
>
> Benchmark Mode Cnt Score Error Units
> DoubleSignum.ofMostlyNaN avgt 3 2.776 ± 0.006 ns/op
> DoubleSignum.ofMostlyNeg avgt 3 2.773 ± 0.066 ns/op
> DoubleSignum.ofMostlyPos avgt 3 2.772 ± 0.084 ns/op
> DoubleSignum.ofMostlyZero avgt 3 2.770 ± 0.045 ns/op
> DoubleSignum.ofRandom avgt 3 2.769 ± 0.005 ns/op
> DoubleSignum.overhead avgt 3 2.376 ± 0.013 ns/op
>
>
> I think it might be more useful for you to work on optimizing
> Math.copysign().
>
More information about the hotspot-compiler-dev
mailing list