[aarch64-port-dev ] [PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics, approval request

Fri Mar 1 10:20:56 UTC 2019

Hi Andrew,

Please see my response embedded in following mail.

Thanks,
Jatin

> -----Original Message-----
> From: Andrew Dinn [mailto:adinn at redhat.com]
> Sent: Friday, March 1, 2019 3:15 PM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>; Vladimir Kozlov
> <vladimir.kozlov at oracle.com>; Pengfei Li (Arm Technology China)
> <Pengfei.Li at arm.com>; B. Blaser <bsrbnd at gmail.com>; aarch64-port-
> dev at openjdk.java.net
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: [aarch64-port-dev ] [PATCH] 8217561 : X86: Add floating-point
> Math.min/max intrinsics, approval request
> 
> On 01/03/2019 02:35, Bhateja, Jatin wrote:
> 
> >>> I didn't quite understand Jatin's additional code below.
> >>> --
> >>> +#ifdef X86
> >>> +  // Being conservative since all the phi edges may not be set
> >>> +  // by now. This is done to skip over reduction scenarios.
> >>> +  if (a->is_Phi() || b->is_Phi())
> >>> +    return false;
> >>> +#endif
> >>> --
> >>> Is it going to black out *all* reduction scenarios? I see the
> >>> intrinsics benefit
> >> the reduction in some cases. And in my opinion, adding this kind of
> >> platform- dependent macros in hotspot shared code is not so good.
> >
> > Proposed check was added based on the common reduction scenario cases
> > which showed performance degradation with new intrinsic sequence for
> X86.
> That doesn't actually clarify things very well. Are you saying:
> 
> 1a) your patch disables FPMinMax reduction for all architectures?
> 
> or
> 
> 1b) your patch disables FPMinMax reduction for x86?
> 
> and
> 
> 2a) it does so because when reduction is enabled x86 fails to show
> performance improvement for applications of reduction?
> 
> or
> 
> 2b) it does so because when reduction is enabled x86 fails to show
> performance improvement for selection of the FPMin/Max intrinsic?
> 

Current patch which is under review does not contain above code change to bypass intrinsic creation for reduction patterns.
For X86 performance degrades with intrinsic w.r.t to non-intrinsic implementation in reduction
scenarios with and without data variance (i.e. with and without branch predication effects). 

I  could not find right hooks which can be called from common code for adding any such target specific checks during ideal(DAG) construction. 
Please share if you know any.

> I think you are saying 1a and 2b but I'd prefer to be sure. I would like a clear
> answer because Pengfei has a pending patch which shows significant benefit
> on AArch64 using first the FPMin/Max intrinsic and then, for extra gain,
> FPMin/Max reduction. My own investigations have not show any detrimental
> effect to using the intrinsic or reduction and Andrew Haley seems to have
> withdrawn the claim that the intrinsic can worsen performance. So, it is quite
> important to understand what your patch does and why.
> 
> If there is some other way to avoid the slowdown on x86 (whether that
> comes with use of the intrinsic or with use of reduction) without clobbering
> the gains to be had on AArch64 then that would be preferable.
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander