[vector-api][X86] Intrinsic creation for missing cases of iotaShuffle API

Viswanathan, Sandhya sandhya.viswanathan at intel.com
Tue Dec 3 00:14:42 UTC 2019


Hi Jatin,

The patch looks good to me.

Best Regards,
Sandhya

-----Original Message-----
From: Bhateja, Jatin <jatin.bhateja at intel.com> 
Sent: Saturday, November 30, 2019 4:00 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Cc: panama-dev at openjdk.java.net; John Rose <john.r.rose at oracle.com>
Subject: RE: [vector-api][X86] Intrinsic creation for missing cases of iotaShuffle API

Hi Sandhya,

Please find below update patch.
http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/webrev.03/

Thanks,
Jatin

> -----Original Message-----
> From: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Sent: Sunday, November 24, 2019 8:43 AM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>; John Rose 
> <john.r.rose at oracle.com>
> Cc: panama-dev at openjdk.java.net
> Subject: RE: [vector-api][X86] Intrinsic creation for missing cases of 
> iotaShuffle API
> 
> Hi Jatin,
> 
> Thanks for incorporating the suggestions.  Couple of improvements:
> 1) In library_call.cpp, if num_elem is not power of 2, return false (no intrinsic).
> 2) In library_call.cpp, lines 7151 to 7166 only needs to be done if 
> (step != 1 || start != 0) or if either of them is not a constant.
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: Bhateja, Jatin <jatin.bhateja at intel.com>
> Sent: Friday, November 22, 2019 4:20 AM
> To: John Rose <john.r.rose at oracle.com>; Viswanathan, Sandhya 
> <sandhya.viswanathan at intel.com>
> Cc: panama-dev at openjdk.java.net
> Subject: RE: [vector-api][X86] Intrinsic creation for missing cases of 
> iotaShuffle API
> 
> Hi  John, Sandhya,
> 
> Please find below an updated patch which does suggested strength 
> reductions.
> 
> http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/webr
> ev.01/
> 
> My understanding is that vector-API is a base level API and will be 
> used intelligently by the users, thus apart from macro APIs like 
> iotaShuffle , addIndex (may be few others) compiler based strength 
> reduction may not be needed.
> 
> For auto-vectorization flow, scalar graph which is fed to SuperWord is 
> already strength reduced by construction since Node*::Identity 
> routines does the necessary job.
> 
> Best Regards,
> Jatin
> 
> > -----Original Message-----
> > From: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> > Sent: Thursday, November 14, 2019 3:30 AM
> > To: John Rose <john.r.rose at oracle.com>; Bhateja, Jatin 
> > <jatin.bhateja at intel.com>
> > Cc: panama-dev at openjdk.java.net
> > Subject: RE: [vector-api][X86] Intrinsic creation for missing cases 
> > of iotaShuffle API
> >
> > Hi John,
> >
> > I like your suggestion to check if the step is constant and has a 
> > value 1 in library_call.cpp before generating the MulI node. 
> > Likewise, the AddI node generation can be avoided if the start is 0.
> >
> > The Blend node generation can only be removed if the start also is 0 
> > along with step being 1 at parsing time. If start is non zero, then 
> > index can go out of range. Similarly, the do_wrap path can be 
> > optimized for
> start=0, step=1.
> >
> > I don’t think any strength reduction is happening today on vector 
> > nodes after generation.
> >
> > Best Regards,
> > Sandhya
> >
> > -----Original Message-----
> > From: panama-dev <panama-dev-bounces at openjdk.java.net> On Behalf Of 
> > John Rose
> > Sent: Wednesday, November 13, 2019 11:37 AM
> > To: Bhateja, Jatin <jatin.bhateja at intel.com>
> > Cc: panama-dev at openjdk.java.net
> > Subject: Re: [vector-api][X86] Intrinsic creation for missing cases 
> > of iotaShuffle API
> >
> > On Nov 13, 2019, at 7:53 AM, Bhateja, Jatin 
> > <jatin.bhateja at intel.com>
> wrote:
> > >
> > > Hi All,
> > >
> > > Please find below a patch which intrinsifies following missing 
> > > cases of iotaShuffle :-
> > >
> > >  1.  scale > 1.
> > >  2.  wrap = false.
> > >
> > > http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/
> > > we
> > > br
> > > ev.00/
> >
> > Thank you Jatin for fixing some of my “FIXME” comments!
> >
> > In AbstractSpecies.java there seems to be a funny 2-space indentation.
> > Please align it to the ambient 4-space convention.
> >
> > In loopTransform.cpp I would encourage you to make your change more 
> > incremental.
> > Your change re-orders the existing predicate terms while inserting 
> > the new term at the beginning.  It is more conventional to add new 
> > logic at the end, while retaining the flow of the old logic.  I 
> > think breaking the lines *is* a good idea, so something like this is good:
> >
> > -      if (opc == Op_StoreP || opc == Op_StoreN || opc == Op_StoreNKlass ||
> > opc == Op_StoreCM) {
> > +-      if (opc == Op_StoreVector || opc == Op_StoreP ||
> > ++      if (opc == Op_StoreP ||
> > +           opc == Op_StoreN || opc == Op_StoreNKlass ||
> > +-          opc == Op_StoreCM) {
> > ++          opc == Op_StoreCM || opc == Op_StoreVector) {
> >
> > My biggest comment (which is still pretty minor) is about
> VectorIntrinsics.java.
> >
> > The handling of arguments in the intrinsic is irregular and I think 
> > this is the right moment to make it align better to the other 
> > intrinsics.  Specifically, the arguments to the lambda should 
> > precede the lambda, in the same order as they are passed to the lambda.
> >
> > This means that the lambda should take the following form:
> >
> >      interface ShuffleIotaOperation<E> {
> >
> > -        VectorShuffle<E> apply(int step, int length);
> > +-        VectorShuffle<E> apply(int start, int step, int length);
> > ++        VectorShuffle<E> apply(int length, int start, int step, 
> > ++ int wrap);
> >
> >      }
> >
> > I don’t know why length and start got swapped in the first place; 
> > now is the time to fix it.  Also, wrap should be present to the 
> > lambda.  I suggest making wrap a true boolean; there’s no reason to 
> > lower it to an int, and that loses information about the range of the value.
> >
> > You can still use two lambdas if you want for the fallback, one for 
> > each value of wrap.  But I suspect using one would be cleaner.
> >
> > (I’ll defer to Vladimir on this; he may have more information about 
> > what’s right here.)
> >
> > The above adjustments might make it convenient to simplify the 
> > AbstractVector API to have just one method with a boolean rather 
> > than two methods.
> >
> > A couple comments about library_call.cpp:  There are two places 
> > where mod_val is computed as num_elem-1 and then applied with 
> > Op_AndI.  The intrinsic must check is_power_of_2(num_elem) and fail 
> > the intrinsic if that happens.  This may happen with ARM SVE—in theory.
> >
> > Also, if step is the constant 1 then there is no need to perform the 
> > Op_MulI or to build the VectorBlendNode which handles overflow at 
> > the end; all indexes will be in range.  This case should be covered 
> > even though the Java code attempts to cover it also.  The optimizer 
> > might discover a constant value of
> > step=1 that the Java code misses.
> >
> > There might be a couple kinds of technical debt here:  First, we 
> > want to strength-reduce Op_MulI to Op_LShiftI (or nop) when 
> > possible.  Does this happen downstream of the library_call.cpp code?  
> > Second, we want to keep track of what happens when the 
> > multiplication overflows because step is large, and/or the lane size 
> > is small.  That behavior has to be specified and followed both in 
> > Java code and the intrinsic—and tested.  (Also, the VectorBlendNode 
> > should nop itself away if the step turns out to be constant 1 after 
> > parsing.  Not sure how hard this is.)
> >
> > Reviewed.
> >
> > — John
> 
> 




More information about the panama-dev mailing list