[aarch64-port-dev ] [PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics, approval request

Fri Mar 1 02:35:02 UTC 2019

Hi Pengfei,

Please find my response in following mail.

Best Regards,
Jatin

> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, March 1, 2019 12:57 AM
> To: Pengfei Li (Arm Technology China) <Pengfei.Li at arm.com>; Bhateja, Jatin
> <jatin.bhateja at intel.com>; B. Blaser <bsrbnd at gmail.com>; aarch64-port-
> dev at openjdk.java.net
> Cc: hotspot-compiler-dev at openjdk.java.net; Viswanathan, Sandhya
> <sandhya.viswanathan at intel.com>
> Subject: Re: [aarch64-port-dev ] [PATCH] 8217561 : X86: Add floating-point
> Math.min/max intrinsics, approval request
> 
> Thank you, Pengfei
> 
> Then lets keep branch prediction heuristic shared. I take back my previous
> suggestion to have a function for it.
> 
> Jatin, can you Pengfei's question about your change?
> 
> Thanks,
> Vladimir
> 
> On 2/27/19 10:45 PM, Pengfei Li (Arm Technology China) wrote:
> > Hi Vladimir, Jatin and All,
> >
> >> So I have question for aarch64 developers. Are aarch64 fmin/fmax
> >> instructions are always faster than code generated by default? If
> >> this is true new conditions should be x86 specific. To have a
> >> separate function to do these checks. We have precedent -
> >> clear_upper_avx(). May be later we have to add other conditions for
> other platforms too.
> >
> > I am the author of original AArch64 fmin/fmax intrinsics patch[1], but not a
> reviewer.
> >
> > Both Andrew Haley and I have tested the performance of AArch64
> fmin/fmax instructions before. As far as I could remember, the result is
> similar to what we have seen here on x86. If selecting the min/max values
> from an array of random numbers, fmin/fmax instructions show better
> performance. But for an already (almost) sorted array, fmin/fmax
> instructions do make the performance worse, but not too much. So
> personally I think, adding heuristic in shared code would benefit AArch64 as
> well.
> >
> > I didn't quite understand Jatin's additional code below.
> > --
> > +#ifdef X86
> > +  // Being conservative since all the phi edges may not be set
> > +  // by now. This is done to skip over reduction scenarios.
> > +  if (a->is_Phi() || b->is_Phi())
> > +    return false;
> > +#endif
> > --
> > Is it going to black out *all* reduction scenarios? I see the intrinsics benefit
> the reduction in some cases. And in my opinion, adding this kind of platform-
> dependent macros in hotspot shared code is not so good.

Proposed check was added based on the common reduction scenario cases which showed 
performance degradation with new intrinsic sequence for X86.

> >
> > [1] http://hg.openjdk.java.net/jdk/jdk/rev/f15af1e2c683
> >
> > --
> > Thanks,
> > Pengfei
> >