[PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics, approval request

B. Blaser bsrbnd at gmail.com
Mon Feb 18 16:09:45 UTC 2019


On Mon, 18 Feb 2019 at 16:37, Andrew Haley <aph at redhat.com> wrote:
>
> On 2/18/19 1:26 PM, B. Blaser wrote:
> >
> > Intrinsic instruction sequences are definitely fast and other
> > optimizations can benefit from their mathematical properties.
>
> Yes, they can be.
>
> > Of course, statistical optimizations could be even faster but making
> > assumptions about predictability to exclude intrinsics is rather
> > dangerous.
>
> I'm not convinced that it is at all dangerous. The pattern I
> illustrated is uncommon, and might will be considerably more common
> than the pattern than the benchmark presented by Jatin. But we should
> not choose our benchmarks so that they make our code look
> good. Instead, we should use benchmarks to help us decide what to do.
>
> > The JVM should be able to decide dynamically whether to use intrinsics
> > or not depending on the reliability of its statistics?!
>
> Perhaps so, yes. So before we decide to commit changes that may well make the
> JVM worse on many (most?) workloads, we should find a way to do that.

Yes and no, simply try your example with unfavourable data:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class FpMinMaxIntrinsics {
    private static final int COUNT = 1000;

    private float[] floats = new float[COUNT];

    private Random r = new Random();

    @Setup
    public void init() {
        for (int i=0; i<COUNT; i++) {
            if (i % 2 == 0)
                floats[i] = r.nextFloat();
            else
                floats[i] = -0.0f;
        }
    }

    @Benchmark
    public float fMinReduce() {
        float result = Float.MAX_VALUE;

        for (int i=0; i<COUNT; i++)
            result = Math.min(result, floats[i]);

        return result;
    }
}

With the intrinsic:

Benchmark                      Mode  Cnt     Score   Error  Units
FpMinMaxIntrinsics.fMinReduce  avgt       2386.708          ns/op


Without:

Benchmark                      Mode  Cnt      Score   Error  Units
FpMinMaxIntrinsics.fMinReduce  avgt       14042.155          ns/op


The execution time of the intrinsic will always be stable and you'll
never have such performance drop-down.

Bernard


More information about the hotspot-compiler-dev mailing list