[aarch64-port-dev ] RFR(M): 8212043: Add floating-point Math.min/max intrinsics
Andrew Haley
aph at redhat.com
Mon Oct 22 17:07:15 UTC 2018
On 10/22/2018 03:40 AM, Pengfei Li (Arm Technology China) wrote:
> I re-tested this JMH code manually on an AArch64 server just now.
> The findFmin() and findFmax() items do not have much performance
> gain (1.1x - 1.2x). But the findDmin() and findDmax() items are
> optimized a lot (about 29x - 30x). I don't understand why float and
> double differ so greatly. Maybe you could try it in your machine and
> see if it's the similar result.
Writing jmh benchmarks can be really difficult. C2 is an extremely
clever compiler so you need to confuse it so totally that it does not
completely optimize away your benchmark. I have rewritten your
benchmark with that in mind; please find it at
http://cr.openjdk.java.net/~aph/8212043/TestFpMinMaxIntrinsics.java
Before:
Benchmark Mode Cnt Score Error Units
TestFpMinMaxIntrinsics.findDmax avgt 3 9.626 ± 0.037 us/op
TestFpMinMaxIntrinsics.findDmin avgt 3 9.688 ± 0.043 us/op
TestFpMinMaxIntrinsics.findFmax avgt 3 9.351 ± 0.357 us/op
TestFpMinMaxIntrinsics.findFmin avgt 3 9.483 ± 2.770 us/op
After:
Benchmark Mode Cnt Score Error Units
TestFpMinMaxIntrinsics.findDmax avgt 3 5.384 ± 0.003 us/op
TestFpMinMaxIntrinsics.findDmin avgt 3 5.382 ± 0.004 us/op
TestFpMinMaxIntrinsics.findFmax avgt 3 5.383 ± 0.005 us/op
TestFpMinMaxIntrinsics.findFmin avgt 3 5.384 ± 0.028 us/op
Please consider if there are any situations in which your intrinsics
might make code slower. To see if this can happen I have written
another benchmark.
Here it is with -XX:-InlineMathNatives:
Benchmark (shuffle) Mode Cnt Score Error Units
TestFpMinMaxIntrinsics2.findFmin false avgt 3 4.251 ± 0.003 us/op
and with -XX:+InlineMathNatives:
Benchmark (shuffle) Mode Cnt Score Error Units
TestFpMinMaxIntrinsics2.findFmin false avgt 3 5.375 ± 0.001 us/op
The difference is the shuffle of the local variables. Is it likely to
be a common case that C2 can determine from its branch statistics that
a fast path can be highly optimized, and this visibility disappears
when we have an intrinsic? Should we do anything about that?
Please think also about constant propagation. This:
@Benchmark
public double constExpr() {
double tmp = dnums[33];
for (int i = 1; i < SIZE; i++) {
tmp = min(dnums[27], min(0.1, min(1.1, min(2.1, min(3.1, min(4.1, min(5.1, min(6.1, min(7.1, min(8.1, min(9.1, dnums[12])))))))))));
}
return tmp;
}
}
causes an Internal Error
(/home/aph/jdk-jdk/src/hotspot/share/opto/phaseX.cpp:691) when I run
it with your patch. I think you are not handling the case where both
arguments are constant, and you need to do that. It might be
sufficient simply to say
if (a->is_Con() || b->is_Con()) {
return false;
}
but maybe you want to be more ambitious.
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the aarch64-port-dev
mailing list