[vector-api][X86] Intrinsic creation for missing cases of iotaShuffle API

Bhateja, Jatin jatin.bhateja at intel.com
Fri Nov 22 12:20:16 UTC 2019


Hi  John, Sandhya,

Please find below an updated patch which does suggested strength reductions.

http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/webrev.01/

My understanding is that vector-API is a base level API and will be used intelligently by the users,
thus apart from macro APIs like iotaShuffle , addIndex (may be few others) compiler based
strength reduction may not be needed. 

For auto-vectorization flow, scalar graph which is fed to SuperWord is already strength reduced
by construction since Node*::Identity routines does the necessary job.

Best Regards,
Jatin

> -----Original Message-----
> From: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Sent: Thursday, November 14, 2019 3:30 AM
> To: John Rose <john.r.rose at oracle.com>; Bhateja, Jatin
> <jatin.bhateja at intel.com>
> Cc: panama-dev at openjdk.java.net
> Subject: RE: [vector-api][X86] Intrinsic creation for missing cases of iotaShuffle
> API
> 
> Hi John,
> 
> I like your suggestion to check if the step is constant and has a value 1 in
> library_call.cpp before generating the MulI node. Likewise, the AddI node
> generation can be avoided if the start is 0.
> 
> The Blend node generation can only be removed if the start also is 0 along
> with step being 1 at parsing time. If start is non zero, then index can go out of
> range. Similarly, the do_wrap path can be optimized for start=0, step=1.
> 
> I don’t think any strength reduction is happening today on vector nodes after
> generation.
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: panama-dev <panama-dev-bounces at openjdk.java.net> On Behalf Of
> John Rose
> Sent: Wednesday, November 13, 2019 11:37 AM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> Cc: panama-dev at openjdk.java.net
> Subject: Re: [vector-api][X86] Intrinsic creation for missing cases of iotaShuffle
> API
> 
> On Nov 13, 2019, at 7:53 AM, Bhateja, Jatin <jatin.bhateja at intel.com> wrote:
> >
> > Hi All,
> >
> > Please find below a patch which intrinsifies following missing cases
> > of iotaShuffle :-
> >
> >  1.  scale > 1.
> >  2.  wrap = false.
> >
> > http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/webr
> > ev.00/
> 
> Thank you Jatin for fixing some of my “FIXME” comments!
> 
> In AbstractSpecies.java there seems to be a funny 2-space indentation.
> Please align it to the ambient 4-space convention.
> 
> In loopTransform.cpp I would encourage you to make your change more
> incremental.
> Your change re-orders the existing predicate terms while inserting the new
> term at the beginning.  It is more conventional to add new logic at the end,
> while retaining the flow of the old logic.  I think breaking the lines *is* a good
> idea, so something like this is good:
> 
> -      if (opc == Op_StoreP || opc == Op_StoreN || opc == Op_StoreNKlass ||
> opc == Op_StoreCM) {
> +-      if (opc == Op_StoreVector || opc == Op_StoreP ||
> ++      if (opc == Op_StoreP ||
> +           opc == Op_StoreN || opc == Op_StoreNKlass ||
> +-          opc == Op_StoreCM) {
> ++          opc == Op_StoreCM || opc == Op_StoreVector) {
> 
> My biggest comment (which is still pretty minor) is about VectorIntrinsics.java.
> 
> The handling of arguments in the intrinsic is irregular and I think this is the
> right moment to make it align better to the other intrinsics.  Specifically, the
> arguments to the lambda should precede the lambda, in the same order as
> they are passed to the lambda.
> 
> This means that the lambda should take the following form:
> 
>      interface ShuffleIotaOperation<E> {
> 
> -        VectorShuffle<E> apply(int step, int length);
> +-        VectorShuffle<E> apply(int start, int step, int length);
> ++        VectorShuffle<E> apply(int length, int start, int step, int
> ++ wrap);
> 
>      }
> 
> I don’t know why length and start got swapped in the first place; now is the
> time to fix it.  Also, wrap should be present to the lambda.  I suggest making
> wrap a true boolean; there’s no reason to lower it to an int, and that loses
> information about the range of the value.
> 
> You can still use two lambdas if you want for the fallback, one for each value
> of wrap.  But I suspect using one would be cleaner.
> 
> (I’ll defer to Vladimir on this; he may have more information about what’s
> right here.)
> 
> The above adjustments might make it convenient to simplify the
> AbstractVector API to have just one method with a boolean rather than two
> methods.
> 
> A couple comments about library_call.cpp:  There are two places where
> mod_val is computed as num_elem-1 and then applied with Op_AndI.  The
> intrinsic must check is_power_of_2(num_elem) and fail the intrinsic if that
> happens.  This may happen with ARM SVE—in theory.
> 
> Also, if step is the constant 1 then there is no need to perform the Op_MulI or
> to build the VectorBlendNode which handles overflow at the end; all indexes
> will be in range.  This case should be covered even though the Java code
> attempts to cover it also.  The optimizer might discover a constant value of
> step=1 that the Java code misses.
> 
> There might be a couple kinds of technical debt here:  First, we want to
> strength-reduce Op_MulI to Op_LShiftI (or nop) when possible.  Does this
> happen downstream of the library_call.cpp code?  Second, we want to keep
> track of what happens when the multiplication overflows because step is
> large, and/or the lane size is small.  That behavior has to be specified and
> followed both in Java code and the intrinsic—and tested.  (Also, the
> VectorBlendNode should nop itself away if the step turns out to be constant 1
> after parsing.  Not sure how hard this is.)
> 
> Reviewed.
> 
> — John



More information about the panama-dev mailing list