[PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics
B. Blaser
bsrbnd at gmail.com
Sun Feb 3 19:55:25 UTC 2019
On Sun, 3 Feb 2019 at 13:23, B. Blaser <bsrbnd at gmail.com> wrote:
>
> Hi Sandhya,
>
> On Sat, 2 Feb 2019 at 17:57, Viswanathan, Sandhya
> <sandhya.viswanathan at intel.com> wrote:
> >
> > Hi Bernard,
> >
> > There were two problems in the patch:
> > 1) The instruction encoding of 'minpd' and 'maxpd' (it was being encoded as minps, maxps).
> > 2) The registers for operands 'a' and 'b' were being swapped but the 'effect' in the instruct only showed it as USE.
> > Two alternative here:
> > a) Mark the register as USE_KILL but that can only be used on fixed registers (a restriction on register allocator).
> > b) Use another temporary register and don’t overwrite 'a' and 'b'.
> > So we went with b).
>
> Yes, sorry, I went too quickly:
>
> 2849 // Following pseudo code describes the algorithm for max[FD]:
> 2850 // Min algorithm is on similar lines
> 2851 // btmp = (b < 0) ? a : b
> 2852 // atmp = (b < 0) ? b : a
> 2853 // Tmp = Max_Float( atmp , btmp)
> 2854 // Res = (atmp == NaN) ? atmp : Tmp
>
> seems good to me.
>
> > We will do the test-tier1 for 32 bit and get back. We will also update the copyright year and maxss, cmpss change in the next update.
>
> Perfect.
One more note about the rules you added in x86_64.ad to handle legacy registers.
Consider the following example:
class MinMax {
public static void main(String... args) {
System.out.println(test());
}
public static double test() {
double d = 0.0;
for (; d < 1000000.0; d = java.lang.Math.max(d, d+1)) ;
return d;
}
}
Running it like this:
$ java -XX:+PrintOptoAssembly -Xcomp -XX:+InlineIntrinsics
-XX:-TieredCompilation -XX:CompileOnly=MinMax::test MinMax
gives:
movsd XMM5, [rsp + #0] # spill
vaddsd XMM1, XMM5, [constant table base + #0] # load from constant
table: double=#1.000000
movsd XMM1,XMM1 ! load double (8 bytes)
blendvpd XMM3,XMM1,XMM5,XMM1
blendvpd XMM2,XMM5,XMM1,XMM1
vmaxpd XMM4,XMM2,XMM3
cmppd.unordered XMM3, XMM2, XMM2
blendvpd XMM1,XMM4,XMM2,XMM3
Unless I missed anything, I guess 'movsd XMM1, XMM1' is useless?
You should probably be able to simply get rid of 'legRegD' as you
explicitly use XMM registers (for example '$a$$XMMRegister').
Thus 'instruct maxD_reg(regD dst, regD a, regD b, regD tmp, regD atmp,
regD btmp)' would give:
movsd XMM5, [rsp + #0] # spill
vaddsd XMM1, XMM5, [constant table base + #0] # load from constant
table: double=#1.000000
blendvpd XMM3,XMM1,XMM5,XMM1
blendvpd XMM2,XMM5,XMM1,XMM1
vmaxpd XMM4,XMM2,XMM3
cmppd.unordered XMM3, XMM2, XMM2
blendvpd XMM1,XMM4,XMM2,XMM3
Please also verify the parameters of the test case, given the example
above, I'm not sure the intrinsic is really inlined as expected, I
suspect 'Math.max()' is simply compiled as before.
Thanks,
Bernard
> > Regarding the test case, it is from Pengfei.Li at arm.com. http://cr.openjdk.java.net/~pli/rfr/8212043/webrev.04 . Andrew Dinn missed pushing it as part of the commit for 8212043.
> > It would be better if Pengfei/Andrew push it. I will request them and remove from this patch.
>
> Since the test case may be slightly updated (some parameters),
> shouldn't we push it as part of this patch adding a second
> 'contributed-by' to Pengfei?
>
> Thanks,
> Bernard
>
> > Best Regards,
> > Sandhya
More information about the hotspot-compiler-dev
mailing list