[PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics, approval request

Sun Feb 17 15:36:02 UTC 2019

Hi Jatin, Vladimir, All,

On Sun, 17 Feb 2019 at 10:02, Bhateja, Jatin <jatin.bhateja at intel.com> wrote:
>
> Hi Vladimir,
>
> I have added a benchmark over the revision shared by Bernard
>
> Following is the link to updated patch.
>
> http://cr.openjdk.java.net/~sviswanathan/Jatin/8217561/webrev.04/
>
> Regards,
> Jatin
>
> > -----Original Message-----
> > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> > Sent: Saturday, February 16, 2019 7:19 AM
> >
> > Eric C. asked to add JMH microbanchmark to show performance of new
> > code. You may be missed his e-mail:
> >
> > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-
> > February/032830.html
> >
> > We need to see benefits from these changes and verify them before pushing
> > it.
> >
> > Thanks,
> > Vladimir

Thanks for your benchmark, Jatin, which is a good start.

Unfortunately, it's biased because the loops are far too much
predictable and modulo operations very expensive vs min/max.
Therefore, the results don't seem to be really representative.

The legal header looks also unusual and please use 4 spaces
indentation for Java files.

I wrote another benchmark, here under and also attached, to be added
to webrev.04.

Here are the results on x86_64 (xeon), before:

$ make test TEST="micro:vm.compiler.FpMinMaxIntrinsics"
MICRO="VM_OPTIONS='-XX:CompileCommand=print,org/openjdk/bench/vm/compiler/FpMinMaxIntrinsics.*Bench*
-XX:-InlineMathNatives';FORK=1;WARMUP_ITER=1;ITER=3"

089       movsd   XMM0, [R10 + #16 + RAX << #3]    # double
[...]
099       movsd   XMM1, [R10 + #16 + R8 << #3]    # double
[...]
0c0       ucomisd XMM0, XMM1 test

Benchmark                Mode  Cnt     Score      Error  Units
FpMinMaxIntrinsics.dMax  avgt    3  8786.771 ± 1853.250  ns/op
FpMinMaxIntrinsics.dMin  avgt    3  9066.529 ± 1257.368  ns/op
FpMinMaxIntrinsics.fMax  avgt    3  8655.304 ± 2056.253  ns/op
FpMinMaxIntrinsics.fMin  avgt    3  8639.211 ± 1400.143  ns/op

and after:

$ make test TEST="micro:vm.compiler.FpMinMaxIntrinsics"
MICRO="VM_OPTIONS='-XX:CompileCommand=print,org/openjdk/bench/vm/compiler/FpMinMaxIntrinsics.*Bench*
-XX:+InlineMathNatives';FORK=1;WARMUP_ITER=1;ITER=3"

085       movsd   XMM4, [R11 + #16 + RAX << #3]    # double
[...]
095       movsd   XMM0, [R11 + #16 + R8 << #3]    # double
[...]
09c       blendvpd         XMM2,XMM0,XMM4,XMM0
09c       blendvpd         XMM1,XMM4,XMM0,XMM0
09c       vmaxsd           XMM3,XMM1,XMM2
09c       cmppd.unordered  XMM2,XMM1,XMM1
09c       blendvpd         XMM0,XMM3,XMM1,XMM2

Benchmark                Mode  Cnt     Score      Error  Units
FpMinMaxIntrinsics.dMax  avgt    3  3971.653 ±  855.886  ns/op
FpMinMaxIntrinsics.dMin  avgt    3  3787.000 ±  524.271  ns/op
FpMinMaxIntrinsics.fMax  avgt    3  3786.412 ± 1013.097  ns/op
FpMinMaxIntrinsics.fMin  avgt    3  4051.351 ± 2327.196  ns/op

We see that the new instruction sequence is about 2x faster than the
previous one.
Could we have a Reviewer feedback and eventually a definitive approval for this?

Thanks,
Bernard

package org.openjdk.bench.vm.compiler;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.*;

import java.util.concurrent.TimeUnit;
import java.util.Random;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class FpMinMaxIntrinsics {
    private static final int COUNT = 1000;

    private double[] doubles = new double[COUNT];
    private float[] floats = new float[COUNT];

    private int c1, c2, s1, s2;

    private Random r = new Random();

    @Setup
    public void init() {
        c1 = s1 = step();
        c2 = COUNT - (s2 = step());

        for (int i=0; i<COUNT; i++) {
            floats[i] = r.nextFloat();
            doubles[i] = r.nextDouble();
        }
    }

    private int step() {
        return (r.nextInt() & 0xf) + 1;
    }

    @Benchmark
    public void dMax(Blackhole bh) {
        for (int i=0; i<COUNT; i++)
            bh.consume(dMaxBench());
    }

    @Benchmark
    public void dMin(Blackhole bh) {
        for (int i=0; i<COUNT; i++)
            bh.consume(dMinBench());
    }

    @Benchmark
    public void fMax(Blackhole bh) {
        for (int i=0; i<COUNT; i++)
            bh.consume(fMaxBench());
    }

    @Benchmark
    public void fMin(Blackhole bh) {
        for (int i=0; i<COUNT; i++)
            bh.consume(fMinBench());
    }

    private double dMaxBench() {
        inc();
        return Math.max(doubles[c1], doubles[c2]);
    }

    private double dMinBench() {
        inc();
        return Math.min(doubles[c1], doubles[c2]);
    }

    private float fMaxBench() {
        inc();
        return Math.max(floats[c1], floats[c2]);
    }

    private float fMinBench() {
        inc();
        return Math.min(floats[c1], floats[c2]);
    }

    private void inc() {
        c1 = c1 + s1 < COUNT ? c1 + s1 : (s1 = step());
        c2 = c2 - s2 > 0 ? c2 - s2 : COUNT - (s2 = step());
    }
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jdk8217561-webrev04-jmh.patch
Type: text/x-patch
Size: 3344 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20190217/b8fdd7b2/jdk8217561-webrev04-jmh.patch>