[PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics

Sun Feb 3 19:55:25 UTC 2019

On Sun, 3 Feb 2019 at 13:23, B. Blaser <bsrbnd at gmail.com> wrote:
>
> Hi Sandhya,
>
> On Sat, 2 Feb 2019 at 17:57, Viswanathan, Sandhya
> <sandhya.viswanathan at intel.com> wrote:
> >
> > Hi Bernard,
> >
> > There were two problems in the patch:
> > 1) The instruction encoding of 'minpd' and 'maxpd' (it was being encoded as minps, maxps).
> > 2) The registers for operands 'a' and 'b' were being swapped but the 'effect' in the instruct only showed it as USE.
> >     Two alternative here:
> >         a)  Mark the register as USE_KILL but that can only be used on fixed registers (a restriction on register allocator).
> >         b)  Use another temporary register and don’t overwrite 'a' and 'b'.
> >     So we went with b).
>
> Yes, sorry, I went too quickly:
>
> 2849 // Following pseudo code describes the algorithm for max[FD]:
> 2850 // Min algorithm is on similar lines
> 2851 //  btmp = (b < 0) ? a : b
> 2852 //  atmp = (b < 0) ? b : a
> 2853 //  Tmp  = Max_Float( atmp , btmp)
> 2854 //  Res  = (atmp == NaN) ? atmp : Tmp
>
> seems good to me.
>
> > We will do the test-tier1 for 32 bit and get back. We will also update the copyright year and maxss, cmpss change in the next update.
>
> Perfect.

One more note about the rules you added in x86_64.ad to handle legacy registers.

Consider the following example:

class MinMax {
    public static void main(String... args) {
        System.out.println(test());
    }
    public static double test() {
        double d = 0.0;
        for (; d < 1000000.0; d = java.lang.Math.max(d, d+1)) ;
        return d;
    }
}

Running it like this:

$ java -XX:+PrintOptoAssembly -Xcomp -XX:+InlineIntrinsics
-XX:-TieredCompilation -XX:CompileOnly=MinMax::test MinMax

gives:

movsd   XMM5, [rsp + #0]    # spill
vaddsd  XMM1, XMM5, [constant table base + #0]    # load from constant
table: double=#1.000000
movsd XMM1,XMM1    ! load double (8 bytes)
blendvpd         XMM3,XMM1,XMM5,XMM1
blendvpd         XMM2,XMM5,XMM1,XMM1
vmaxpd           XMM4,XMM2,XMM3
cmppd.unordered  XMM3, XMM2, XMM2
blendvpd         XMM1,XMM4,XMM2,XMM3

Unless I missed anything, I guess 'movsd XMM1, XMM1' is useless?

You should probably be able to simply get rid of 'legRegD' as you
explicitly use XMM registers (for example '$a$$XMMRegister').
Thus 'instruct maxD_reg(regD dst, regD a, regD b, regD tmp, regD atmp,
regD btmp)' would give:

movsd   XMM5, [rsp + #0]    # spill
vaddsd  XMM1, XMM5, [constant table base + #0]    # load from constant
table: double=#1.000000
blendvpd         XMM3,XMM1,XMM5,XMM1
blendvpd         XMM2,XMM5,XMM1,XMM1
vmaxpd           XMM4,XMM2,XMM3
cmppd.unordered  XMM3, XMM2, XMM2
blendvpd         XMM1,XMM4,XMM2,XMM3

Please also verify the parameters of the test case, given the example
above, I'm not sure the intrinsic is really inlined as expected, I
suspect 'Math.max()' is simply compiled as before.

Thanks,
Bernard

> > Regarding the test case, it is from Pengfei.Li at arm.com. http://cr.openjdk.java.net/~pli/rfr/8212043/webrev.04 . Andrew Dinn missed pushing it as part of the commit for 8212043.
> > It would be better if Pengfei/Andrew push it. I will request them and remove from this patch.
>
> Since the test case may be slightly updated (some parameters),
> shouldn't we push it as part of this patch adding a second
> 'contributed-by' to Pengfei?
>
> Thanks,
> Bernard
>
> > Best Regards,
> > Sandhya