[vector-api][X86] Intrinsic creation for missing cases of iotaShuffle API

Bhateja, Jatin jatin.bhateja at intel.com
Sat Nov 30 11:59:49 UTC 2019


Hi Sandhya,

Please find below update patch.
http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/webrev.03/

Thanks,
Jatin

> -----Original Message-----
> From: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Sent: Sunday, November 24, 2019 8:43 AM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>; John Rose
> <john.r.rose at oracle.com>
> Cc: panama-dev at openjdk.java.net
> Subject: RE: [vector-api][X86] Intrinsic creation for missing cases of iotaShuffle
> API
> 
> Hi Jatin,
> 
> Thanks for incorporating the suggestions.  Couple of improvements:
> 1) In library_call.cpp, if num_elem is not power of 2, return false (no intrinsic).
> 2) In library_call.cpp, lines 7151 to 7166 only needs to be done if (step != 1 ||
> start != 0) or if either of them is not a constant.
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: Bhateja, Jatin <jatin.bhateja at intel.com>
> Sent: Friday, November 22, 2019 4:20 AM
> To: John Rose <john.r.rose at oracle.com>; Viswanathan, Sandhya
> <sandhya.viswanathan at intel.com>
> Cc: panama-dev at openjdk.java.net
> Subject: RE: [vector-api][X86] Intrinsic creation for missing cases of iotaShuffle
> API
> 
> Hi  John, Sandhya,
> 
> Please find below an updated patch which does suggested strength
> reductions.
> 
> http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/webrev.01/
> 
> My understanding is that vector-API is a base level API and will be used
> intelligently by the users, thus apart from macro APIs like iotaShuffle ,
> addIndex (may be few others) compiler based strength reduction may not be
> needed.
> 
> For auto-vectorization flow, scalar graph which is fed to SuperWord is already
> strength reduced by construction since Node*::Identity routines does the
> necessary job.
> 
> Best Regards,
> Jatin
> 
> > -----Original Message-----
> > From: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> > Sent: Thursday, November 14, 2019 3:30 AM
> > To: John Rose <john.r.rose at oracle.com>; Bhateja, Jatin
> > <jatin.bhateja at intel.com>
> > Cc: panama-dev at openjdk.java.net
> > Subject: RE: [vector-api][X86] Intrinsic creation for missing cases of
> > iotaShuffle API
> >
> > Hi John,
> >
> > I like your suggestion to check if the step is constant and has a
> > value 1 in library_call.cpp before generating the MulI node. Likewise,
> > the AddI node generation can be avoided if the start is 0.
> >
> > The Blend node generation can only be removed if the start also is 0
> > along with step being 1 at parsing time. If start is non zero, then
> > index can go out of range. Similarly, the do_wrap path can be optimized for
> start=0, step=1.
> >
> > I don’t think any strength reduction is happening today on vector
> > nodes after generation.
> >
> > Best Regards,
> > Sandhya
> >
> > -----Original Message-----
> > From: panama-dev <panama-dev-bounces at openjdk.java.net> On Behalf Of
> > John Rose
> > Sent: Wednesday, November 13, 2019 11:37 AM
> > To: Bhateja, Jatin <jatin.bhateja at intel.com>
> > Cc: panama-dev at openjdk.java.net
> > Subject: Re: [vector-api][X86] Intrinsic creation for missing cases of
> > iotaShuffle API
> >
> > On Nov 13, 2019, at 7:53 AM, Bhateja, Jatin <jatin.bhateja at intel.com>
> wrote:
> > >
> > > Hi All,
> > >
> > > Please find below a patch which intrinsifies following missing cases
> > > of iotaShuffle :-
> > >
> > >  1.  scale > 1.
> > >  2.  wrap = false.
> > >
> > > http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/we
> > > br
> > > ev.00/
> >
> > Thank you Jatin for fixing some of my “FIXME” comments!
> >
> > In AbstractSpecies.java there seems to be a funny 2-space indentation.
> > Please align it to the ambient 4-space convention.
> >
> > In loopTransform.cpp I would encourage you to make your change more
> > incremental.
> > Your change re-orders the existing predicate terms while inserting the
> > new term at the beginning.  It is more conventional to add new logic
> > at the end, while retaining the flow of the old logic.  I think
> > breaking the lines *is* a good idea, so something like this is good:
> >
> > -      if (opc == Op_StoreP || opc == Op_StoreN || opc == Op_StoreNKlass ||
> > opc == Op_StoreCM) {
> > +-      if (opc == Op_StoreVector || opc == Op_StoreP ||
> > ++      if (opc == Op_StoreP ||
> > +           opc == Op_StoreN || opc == Op_StoreNKlass ||
> > +-          opc == Op_StoreCM) {
> > ++          opc == Op_StoreCM || opc == Op_StoreVector) {
> >
> > My biggest comment (which is still pretty minor) is about
> VectorIntrinsics.java.
> >
> > The handling of arguments in the intrinsic is irregular and I think
> > this is the right moment to make it align better to the other
> > intrinsics.  Specifically, the arguments to the lambda should precede
> > the lambda, in the same order as they are passed to the lambda.
> >
> > This means that the lambda should take the following form:
> >
> >      interface ShuffleIotaOperation<E> {
> >
> > -        VectorShuffle<E> apply(int step, int length);
> > +-        VectorShuffle<E> apply(int start, int step, int length);
> > ++        VectorShuffle<E> apply(int length, int start, int step, int
> > ++ wrap);
> >
> >      }
> >
> > I don’t know why length and start got swapped in the first place; now
> > is the time to fix it.  Also, wrap should be present to the lambda.  I
> > suggest making wrap a true boolean; there’s no reason to lower it to
> > an int, and that loses information about the range of the value.
> >
> > You can still use two lambdas if you want for the fallback, one for
> > each value of wrap.  But I suspect using one would be cleaner.
> >
> > (I’ll defer to Vladimir on this; he may have more information about
> > what’s right here.)
> >
> > The above adjustments might make it convenient to simplify the
> > AbstractVector API to have just one method with a boolean rather than
> > two methods.
> >
> > A couple comments about library_call.cpp:  There are two places where
> > mod_val is computed as num_elem-1 and then applied with Op_AndI.  The
> > intrinsic must check is_power_of_2(num_elem) and fail the intrinsic if
> > that happens.  This may happen with ARM SVE—in theory.
> >
> > Also, if step is the constant 1 then there is no need to perform the
> > Op_MulI or to build the VectorBlendNode which handles overflow at the
> > end; all indexes will be in range.  This case should be covered even
> > though the Java code attempts to cover it also.  The optimizer might
> > discover a constant value of
> > step=1 that the Java code misses.
> >
> > There might be a couple kinds of technical debt here:  First, we want
> > to strength-reduce Op_MulI to Op_LShiftI (or nop) when possible.  Does
> > this happen downstream of the library_call.cpp code?  Second, we want
> > to keep track of what happens when the multiplication overflows
> > because step is large, and/or the lane size is small.  That behavior
> > has to be specified and followed both in Java code and the
> > intrinsic—and tested.  (Also, the VectorBlendNode should nop itself
> > away if the step turns out to be constant 1 after parsing.  Not sure
> > how hard this is.)
> >
> > Reviewed.
> >
> > — John
> 
> 



More information about the panama-dev mailing list