[PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics, approval request
B. Blaser
bsrbnd at gmail.com
Sun Feb 17 15:36:02 UTC 2019
Hi Jatin, Vladimir, All,
On Sun, 17 Feb 2019 at 10:02, Bhateja, Jatin <jatin.bhateja at intel.com> wrote:
>
> Hi Vladimir,
>
> I have added a benchmark over the revision shared by Bernard
>
> Following is the link to updated patch.
>
> http://cr.openjdk.java.net/~sviswanathan/Jatin/8217561/webrev.04/
>
> Regards,
> Jatin
>
> > -----Original Message-----
> > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> > Sent: Saturday, February 16, 2019 7:19 AM
> >
> > Eric C. asked to add JMH microbanchmark to show performance of new
> > code. You may be missed his e-mail:
> >
> > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-
> > February/032830.html
> >
> > We need to see benefits from these changes and verify them before pushing
> > it.
> >
> > Thanks,
> > Vladimir
Thanks for your benchmark, Jatin, which is a good start.
Unfortunately, it's biased because the loops are far too much
predictable and modulo operations very expensive vs min/max.
Therefore, the results don't seem to be really representative.
The legal header looks also unusual and please use 4 spaces
indentation for Java files.
I wrote another benchmark, here under and also attached, to be added
to webrev.04.
Here are the results on x86_64 (xeon), before:
$ make test TEST="micro:vm.compiler.FpMinMaxIntrinsics"
MICRO="VM_OPTIONS='-XX:CompileCommand=print,org/openjdk/bench/vm/compiler/FpMinMaxIntrinsics.*Bench*
-XX:-InlineMathNatives';FORK=1;WARMUP_ITER=1;ITER=3"
089 movsd XMM0, [R10 + #16 + RAX << #3] # double
[...]
099 movsd XMM1, [R10 + #16 + R8 << #3] # double
[...]
0c0 ucomisd XMM0, XMM1 test
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 3 8786.771 ± 1853.250 ns/op
FpMinMaxIntrinsics.dMin avgt 3 9066.529 ± 1257.368 ns/op
FpMinMaxIntrinsics.fMax avgt 3 8655.304 ± 2056.253 ns/op
FpMinMaxIntrinsics.fMin avgt 3 8639.211 ± 1400.143 ns/op
and after:
$ make test TEST="micro:vm.compiler.FpMinMaxIntrinsics"
MICRO="VM_OPTIONS='-XX:CompileCommand=print,org/openjdk/bench/vm/compiler/FpMinMaxIntrinsics.*Bench*
-XX:+InlineMathNatives';FORK=1;WARMUP_ITER=1;ITER=3"
085 movsd XMM4, [R11 + #16 + RAX << #3] # double
[...]
095 movsd XMM0, [R11 + #16 + R8 << #3] # double
[...]
09c blendvpd XMM2,XMM0,XMM4,XMM0
09c blendvpd XMM1,XMM4,XMM0,XMM0
09c vmaxsd XMM3,XMM1,XMM2
09c cmppd.unordered XMM2,XMM1,XMM1
09c blendvpd XMM0,XMM3,XMM1,XMM2
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 3 3971.653 ± 855.886 ns/op
FpMinMaxIntrinsics.dMin avgt 3 3787.000 ± 524.271 ns/op
FpMinMaxIntrinsics.fMax avgt 3 3786.412 ± 1013.097 ns/op
FpMinMaxIntrinsics.fMin avgt 3 4051.351 ± 2327.196 ns/op
We see that the new instruction sequence is about 2x faster than the
previous one.
Could we have a Reviewer feedback and eventually a definitive approval for this?
Thanks,
Bernard
package org.openjdk.bench.vm.compiler;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.*;
import java.util.concurrent.TimeUnit;
import java.util.Random;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class FpMinMaxIntrinsics {
private static final int COUNT = 1000;
private double[] doubles = new double[COUNT];
private float[] floats = new float[COUNT];
private int c1, c2, s1, s2;
private Random r = new Random();
@Setup
public void init() {
c1 = s1 = step();
c2 = COUNT - (s2 = step());
for (int i=0; i<COUNT; i++) {
floats[i] = r.nextFloat();
doubles[i] = r.nextDouble();
}
}
private int step() {
return (r.nextInt() & 0xf) + 1;
}
@Benchmark
public void dMax(Blackhole bh) {
for (int i=0; i<COUNT; i++)
bh.consume(dMaxBench());
}
@Benchmark
public void dMin(Blackhole bh) {
for (int i=0; i<COUNT; i++)
bh.consume(dMinBench());
}
@Benchmark
public void fMax(Blackhole bh) {
for (int i=0; i<COUNT; i++)
bh.consume(fMaxBench());
}
@Benchmark
public void fMin(Blackhole bh) {
for (int i=0; i<COUNT; i++)
bh.consume(fMinBench());
}
private double dMaxBench() {
inc();
return Math.max(doubles[c1], doubles[c2]);
}
private double dMinBench() {
inc();
return Math.min(doubles[c1], doubles[c2]);
}
private float fMaxBench() {
inc();
return Math.max(floats[c1], floats[c2]);
}
private float fMinBench() {
inc();
return Math.min(floats[c1], floats[c2]);
}
private void inc() {
c1 = c1 + s1 < COUNT ? c1 + s1 : (s1 = step());
c2 = c2 - s2 > 0 ? c2 - s2 : COUNT - (s2 = step());
}
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jdk8217561-webrev04-jmh.patch
Type: text/x-patch
Size: 3344 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20190217/b8fdd7b2/jdk8217561-webrev04-jmh.patch>
More information about the hotspot-compiler-dev
mailing list