[aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp)

Tue Aug 18 15:05:01 UTC 2020

Hi Andrew,

Thanks for taking a look.

This work has started as a try to improve common code, see JDK-8249198 
[1] and short related discussion [2]. And the original benchmark [3] is 
quite similar to the one that you used.

As you kindly tried the patch on a hardware where it shows degradation 
(baseline is quite slow btw), I think it makes sense to limit it to 
Cortex/Neoverse. So I restored UseSignumInrinsic flag which is enabled 
only for CPU_ARM. Disabling InlineMathNatives also disables it.

webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.02/

As suggested by Anrew Dinn, there are few more test cases in the test: 
+-MIN_NORMAL and some denormal numbers.

Some more results for a benchmark with reduce():

-XX:-UseSignumIntrinsic
DoubleOrigSignum.ofMostlyNaN   0.914 ±  0.001  ns/op
DoubleOrigSignum.ofMostlyNeg   1.178 ±  0.001  ns/op
DoubleOrigSignum.ofMostlyPos   1.176 ±  0.017  ns/op
DoubleOrigSignum.ofMostlyZero  0.803 ±  0.001  ns/op
DoubleOrigSignum.ofRandom      1.175 ±  0.012  ns/op
-XX:+UseSignumIntrinsic
DoubleOrigSignum.ofMostlyNaN   1.040 ± 0.007   ns/op
DoubleOrigSignum.ofMostlyNeg   1.040 ± 0.004   ns/op
DoubleOrigSignum.ofMostlyPos   1.039 ± 0.003   ns/op
DoubleOrigSignum.ofMostlyZero  1.040 ± 0.001   ns/op
DoubleOrigSignum.ofRandom      1.040 ± 0.003   ns/op

If we only intrinsify copySign() we lose free mask that we get from 
facgt. In such case improvement (for signum) decreases like from ~30% to 
~15%, and it also greatly depends on the particular HW. We can 
additionally introduce an intrinsic for Math.copySign(), especially it 
makes sense for float where it can be just 2 fp instructions: movi+bsl 
(fmovd+fnegd+bsl for double).

-Dmitry

[1] https://bugs.openjdk.java.net/browse/JDK-8249198
[2] 
https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/067666.html
[3] 
http://cr.openjdk.java.net/~dchuyko/8249198/webrev.00/raw_files/new/test/micro/org/openjdk/bench/java/lang/DoubleSignum.java

On 8/15/20 4:50 PM, Andrew Haley wrote:
> I've been looking at the way Math.signum() is used, mostly by
> searching the GitHub code database. I've changed the JMH test to be
> IMO more realistic: it's at
> http://cr.openjdk.java.net/~aph/DoubleSignum.java. I think it's more
> realitic because signum() results usually aren't stored but are used
> to feed other arithmetic ops, usually + or *.
>
> Baseline:
>
> Benchmark                  Mode  Cnt  Score   Error  Units
> DoubleSignum.ofMostlyNaN   avgt    3  2.409 ± 0.051  ns/op
> DoubleSignum.ofMostlyNeg   avgt    3  2.475 ± 0.211  ns/op
> DoubleSignum.ofMostlyPos   avgt    3  2.494 ± 0.015  ns/op
> DoubleSignum.ofMostlyZero  avgt    3  2.501 ± 0.008  ns/op
> DoubleSignum.ofRandom      avgt    3  2.458 ± 0.373  ns/op
> DoubleSignum.overhead      avgt    3  2.373 ± 0.029  ns/op
>
> -XX:+UseSignumIntrinsic:
>
> Benchmark                  Mode  Cnt  Score   Error  Units
> DoubleSignum.ofMostlyNaN   avgt    3  2.776 ± 0.006  ns/op
> DoubleSignum.ofMostlyNeg   avgt    3  2.773 ± 0.066  ns/op
> DoubleSignum.ofMostlyPos   avgt    3  2.772 ± 0.084  ns/op
> DoubleSignum.ofMostlyZero  avgt    3  2.770 ± 0.045  ns/op
> DoubleSignum.ofRandom      avgt    3  2.769 ± 0.005  ns/op
> DoubleSignum.overhead      avgt    3  2.376 ± 0.013  ns/op
>
>
> I think it might be more useful for you to work on optimizing
> Math.copysign().
>