[PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics

Mon Feb 4 02:15:27 UTC 2019

Hi Blaser,

Please find response embedded in following mail.

Regards,
Jatin

-----Original Message-----
From: B. Blaser [mailto:bsrbnd at gmail.com]
Sent: Monday, February 4, 2019 1:25 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Cc: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov <vladimir.kozlov at oracle.com>; Bhateja, Jatin <jatin.bhateja at intel.com>; Deshpande, Vivek R <vivek.r.deshpande at intel.com>
Subject: Re: [PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics

On Sun, 3 Feb 2019 at 13:23, B. Blaser <bsrbnd at gmail.com<mailto:bsrbnd at gmail.com>> wrote:
>
> Hi Sandhya,
>
> On Sat, 2 Feb 2019 at 17:57, Viswanathan, Sandhya
> <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>> wrote:
> >
> > Hi Bernard,
> >
> > There were two problems in the patch:
> > 1) The instruction encoding of 'minpd' and 'maxpd' (it was being encoded as minps, maxps).
> > 2) The registers for operands 'a' and 'b' were being swapped but the 'effect' in the instruct only showed it as USE.
> >     Two alternative here:
> >         a)  Mark the register as USE_KILL but that can only be used on fixed registers (a restriction on register allocator).
> >         b)  Use another temporary register and don’t overwrite 'a' and 'b'.
> >     So we went with b).
>
> Yes, sorry, I went too quickly:
>
> 2849 // Following pseudo code describes the algorithm for max[FD]:
> 2850 // Min algorithm is on similar lines
> 2851 //  btmp = (b < 0) ? a : b
> 2852 //  atmp = (b < 0) ? b : a
> 2853 //  Tmp  = Max_Float( atmp , btmp)
> 2854 //  Res  = (atmp == NaN) ? atmp : Tmp
>
> seems good to me.
>
> > We will do the test-tier1 for 32 bit and get back. We will also update the copyright year and maxss, cmpss change in the next update.
>
> Perfect.

One more note about the rules you added in x86_64.ad to handle legacy registers.

Consider the following example:

class MinMax {
    public static void main(String... args) {
        System.out.println(test());
    }
    public static double test() {
        double d = 0.0;
        for (; d < 1000000.0; d = java.lang.Math.max(d, d+1)) ;
        return d;
    }
}

Running it like this:

$ java -XX:+PrintOptoAssembly -Xcomp -XX:+InlineIntrinsics -XX:-TieredCompilation -XX:CompileOnly=MinMax::test MinMax
gives:

movsd   XMM5, [rsp + #0]    # spill
vaddsd  XMM1, XMM5, [constant table base + #0]    # load from constant
table: double=#1.000000
movsd XMM1,XMM1    ! load double (8 bytes)
blendvpd         XMM3,XMM1,XMM5,XMM1
blendvpd         XMM2,XMM5,XMM1,XMM1
vmaxpd           XMM4,XMM2,XMM3
cmppd.unordered  XMM3, XMM2, XMM2
blendvpd         XMM1,XMM4,XMM2,XMM3

Unless I missed anything, I guess 'movsd XMM1, XMM1' is useless?

JATIN ≫ It's useless in this case, but its micro operation will not contend for execution unit in backend.

You should probably be able to simply get rid of 'legRegD' as you explicitly use XMM registers (for example '$a$$XMMRegister').

JATIN ≫ blend's instruction uses VEX encoding which constrains it from accessing XMM16-XMM31. Usage of new register class legRegD
                will constrain register allocator to allocate a legal physical register as per its encoding and in that case this move instruction will be useful.

Thus 'instruct maxD_reg(regD dst, regD a, regD b, regD tmp, regD atmp, regD btmp)' would give:

movsd   XMM5, [rsp + #0]    # spill
vaddsd  XMM1, XMM5, [constant table base + #0]    # load from constant
table: double=#1.000000
blendvpd         XMM3,XMM1,XMM5,XMM1
blendvpd         XMM2,XMM5,XMM1,XMM1
vmaxpd           XMM4,XMM2,XMM3
cmppd.unordered  XMM3, XMM2, XMM2
blendvpd         XMM1,XMM4,XMM2,XMM3

Please also verify the parameters of the test case, given the example above, I'm not sure the intrinsic is really inlined as expected, I suspect 'Math.max()' is simply compiled as before.

JATIN >> Intrinsification is indeed happening in the test you provided.

Thanks,
Bernard

> > Regarding the test case, it is from Pengfei.Li at arm.com<mailto:Pengfei.Li at arm.com>. http://cr.openjdk.java.net/~pli/rfr/8212043/webrev.04 . Andrew Dinn missed pushing it as part of the commit for 8212043.
> > It would be better if Pengfei/Andrew push it. I will request them and remove from this patch.
>
> Since the test case may be slightly updated (some parameters),
> shouldn't we push it as part of this patch adding a second
> 'contributed-by' to Pengfei?
>
> Thanks,
> Bernard
>
> > Best Regards,
> > Sandhya

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20190204/8e5b2d3e/attachment-0001.html>