[vector-api][X86] Intrinsic creation for missing cases of iotaShuffle API
Bhateja, Jatin
jatin.bhateja at intel.com
Fri Nov 22 12:20:16 UTC 2019
Hi John, Sandhya,
Please find below an updated patch which does suggested strength reductions.
http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/webrev.01/
My understanding is that vector-API is a base level API and will be used intelligently by the users,
thus apart from macro APIs like iotaShuffle , addIndex (may be few others) compiler based
strength reduction may not be needed.
For auto-vectorization flow, scalar graph which is fed to SuperWord is already strength reduced
by construction since Node*::Identity routines does the necessary job.
Best Regards,
Jatin
> -----Original Message-----
> From: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Sent: Thursday, November 14, 2019 3:30 AM
> To: John Rose <john.r.rose at oracle.com>; Bhateja, Jatin
> <jatin.bhateja at intel.com>
> Cc: panama-dev at openjdk.java.net
> Subject: RE: [vector-api][X86] Intrinsic creation for missing cases of iotaShuffle
> API
>
> Hi John,
>
> I like your suggestion to check if the step is constant and has a value 1 in
> library_call.cpp before generating the MulI node. Likewise, the AddI node
> generation can be avoided if the start is 0.
>
> The Blend node generation can only be removed if the start also is 0 along
> with step being 1 at parsing time. If start is non zero, then index can go out of
> range. Similarly, the do_wrap path can be optimized for start=0, step=1.
>
> I don’t think any strength reduction is happening today on vector nodes after
> generation.
>
> Best Regards,
> Sandhya
>
> -----Original Message-----
> From: panama-dev <panama-dev-bounces at openjdk.java.net> On Behalf Of
> John Rose
> Sent: Wednesday, November 13, 2019 11:37 AM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> Cc: panama-dev at openjdk.java.net
> Subject: Re: [vector-api][X86] Intrinsic creation for missing cases of iotaShuffle
> API
>
> On Nov 13, 2019, at 7:53 AM, Bhateja, Jatin <jatin.bhateja at intel.com> wrote:
> >
> > Hi All,
> >
> > Please find below a patch which intrinsifies following missing cases
> > of iotaShuffle :-
> >
> > 1. scale > 1.
> > 2. wrap = false.
> >
> > http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/webr
> > ev.00/
>
> Thank you Jatin for fixing some of my “FIXME” comments!
>
> In AbstractSpecies.java there seems to be a funny 2-space indentation.
> Please align it to the ambient 4-space convention.
>
> In loopTransform.cpp I would encourage you to make your change more
> incremental.
> Your change re-orders the existing predicate terms while inserting the new
> term at the beginning. It is more conventional to add new logic at the end,
> while retaining the flow of the old logic. I think breaking the lines *is* a good
> idea, so something like this is good:
>
> - if (opc == Op_StoreP || opc == Op_StoreN || opc == Op_StoreNKlass ||
> opc == Op_StoreCM) {
> +- if (opc == Op_StoreVector || opc == Op_StoreP ||
> ++ if (opc == Op_StoreP ||
> + opc == Op_StoreN || opc == Op_StoreNKlass ||
> +- opc == Op_StoreCM) {
> ++ opc == Op_StoreCM || opc == Op_StoreVector) {
>
> My biggest comment (which is still pretty minor) is about VectorIntrinsics.java.
>
> The handling of arguments in the intrinsic is irregular and I think this is the
> right moment to make it align better to the other intrinsics. Specifically, the
> arguments to the lambda should precede the lambda, in the same order as
> they are passed to the lambda.
>
> This means that the lambda should take the following form:
>
> interface ShuffleIotaOperation<E> {
>
> - VectorShuffle<E> apply(int step, int length);
> +- VectorShuffle<E> apply(int start, int step, int length);
> ++ VectorShuffle<E> apply(int length, int start, int step, int
> ++ wrap);
>
> }
>
> I don’t know why length and start got swapped in the first place; now is the
> time to fix it. Also, wrap should be present to the lambda. I suggest making
> wrap a true boolean; there’s no reason to lower it to an int, and that loses
> information about the range of the value.
>
> You can still use two lambdas if you want for the fallback, one for each value
> of wrap. But I suspect using one would be cleaner.
>
> (I’ll defer to Vladimir on this; he may have more information about what’s
> right here.)
>
> The above adjustments might make it convenient to simplify the
> AbstractVector API to have just one method with a boolean rather than two
> methods.
>
> A couple comments about library_call.cpp: There are two places where
> mod_val is computed as num_elem-1 and then applied with Op_AndI. The
> intrinsic must check is_power_of_2(num_elem) and fail the intrinsic if that
> happens. This may happen with ARM SVE—in theory.
>
> Also, if step is the constant 1 then there is no need to perform the Op_MulI or
> to build the VectorBlendNode which handles overflow at the end; all indexes
> will be in range. This case should be covered even though the Java code
> attempts to cover it also. The optimizer might discover a constant value of
> step=1 that the Java code misses.
>
> There might be a couple kinds of technical debt here: First, we want to
> strength-reduce Op_MulI to Op_LShiftI (or nop) when possible. Does this
> happen downstream of the library_call.cpp code? Second, we want to keep
> track of what happens when the multiplication overflows because step is
> large, and/or the lane size is small. That behavior has to be specified and
> followed both in Java code and the intrinsic—and tested. (Also, the
> VectorBlendNode should nop itself away if the step turns out to be constant 1
> after parsing. Not sure how hard this is.)
>
> Reviewed.
>
> — John
More information about the panama-dev
mailing list