[vector-api][X86] Intrinsic creation for missing cases of iotaShuffle API
Viswanathan, Sandhya
sandhya.viswanathan at intel.com
Sun Nov 24 03:13:20 UTC 2019
Hi Jatin,
Thanks for incorporating the suggestions. Couple of improvements:
1) In library_call.cpp, if num_elem is not power of 2, return false (no intrinsic).
2) In library_call.cpp, lines 7151 to 7166 only needs to be done if (step != 1 || start != 0) or if either of them is not a constant.
Best Regards,
Sandhya
-----Original Message-----
From: Bhateja, Jatin <jatin.bhateja at intel.com>
Sent: Friday, November 22, 2019 4:20 AM
To: John Rose <john.r.rose at oracle.com>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Cc: panama-dev at openjdk.java.net
Subject: RE: [vector-api][X86] Intrinsic creation for missing cases of iotaShuffle API
Hi John, Sandhya,
Please find below an updated patch which does suggested strength reductions.
http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/webrev.01/
My understanding is that vector-API is a base level API and will be used intelligently by the users, thus apart from macro APIs like iotaShuffle , addIndex (may be few others) compiler based strength reduction may not be needed.
For auto-vectorization flow, scalar graph which is fed to SuperWord is already strength reduced by construction since Node*::Identity routines does the necessary job.
Best Regards,
Jatin
> -----Original Message-----
> From: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Sent: Thursday, November 14, 2019 3:30 AM
> To: John Rose <john.r.rose at oracle.com>; Bhateja, Jatin
> <jatin.bhateja at intel.com>
> Cc: panama-dev at openjdk.java.net
> Subject: RE: [vector-api][X86] Intrinsic creation for missing cases of
> iotaShuffle API
>
> Hi John,
>
> I like your suggestion to check if the step is constant and has a
> value 1 in library_call.cpp before generating the MulI node. Likewise,
> the AddI node generation can be avoided if the start is 0.
>
> The Blend node generation can only be removed if the start also is 0
> along with step being 1 at parsing time. If start is non zero, then
> index can go out of range. Similarly, the do_wrap path can be optimized for start=0, step=1.
>
> I don’t think any strength reduction is happening today on vector
> nodes after generation.
>
> Best Regards,
> Sandhya
>
> -----Original Message-----
> From: panama-dev <panama-dev-bounces at openjdk.java.net> On Behalf Of
> John Rose
> Sent: Wednesday, November 13, 2019 11:37 AM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> Cc: panama-dev at openjdk.java.net
> Subject: Re: [vector-api][X86] Intrinsic creation for missing cases of
> iotaShuffle API
>
> On Nov 13, 2019, at 7:53 AM, Bhateja, Jatin <jatin.bhateja at intel.com> wrote:
> >
> > Hi All,
> >
> > Please find below a patch which intrinsifies following missing cases
> > of iotaShuffle :-
> >
> > 1. scale > 1.
> > 2. wrap = false.
> >
> > http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/we
> > br
> > ev.00/
>
> Thank you Jatin for fixing some of my “FIXME” comments!
>
> In AbstractSpecies.java there seems to be a funny 2-space indentation.
> Please align it to the ambient 4-space convention.
>
> In loopTransform.cpp I would encourage you to make your change more
> incremental.
> Your change re-orders the existing predicate terms while inserting the
> new term at the beginning. It is more conventional to add new logic
> at the end, while retaining the flow of the old logic. I think
> breaking the lines *is* a good idea, so something like this is good:
>
> - if (opc == Op_StoreP || opc == Op_StoreN || opc == Op_StoreNKlass ||
> opc == Op_StoreCM) {
> +- if (opc == Op_StoreVector || opc == Op_StoreP ||
> ++ if (opc == Op_StoreP ||
> + opc == Op_StoreN || opc == Op_StoreNKlass ||
> +- opc == Op_StoreCM) {
> ++ opc == Op_StoreCM || opc == Op_StoreVector) {
>
> My biggest comment (which is still pretty minor) is about VectorIntrinsics.java.
>
> The handling of arguments in the intrinsic is irregular and I think
> this is the right moment to make it align better to the other
> intrinsics. Specifically, the arguments to the lambda should precede
> the lambda, in the same order as they are passed to the lambda.
>
> This means that the lambda should take the following form:
>
> interface ShuffleIotaOperation<E> {
>
> - VectorShuffle<E> apply(int step, int length);
> +- VectorShuffle<E> apply(int start, int step, int length);
> ++ VectorShuffle<E> apply(int length, int start, int step, int
> ++ wrap);
>
> }
>
> I don’t know why length and start got swapped in the first place; now
> is the time to fix it. Also, wrap should be present to the lambda. I
> suggest making wrap a true boolean; there’s no reason to lower it to
> an int, and that loses information about the range of the value.
>
> You can still use two lambdas if you want for the fallback, one for
> each value of wrap. But I suspect using one would be cleaner.
>
> (I’ll defer to Vladimir on this; he may have more information about
> what’s right here.)
>
> The above adjustments might make it convenient to simplify the
> AbstractVector API to have just one method with a boolean rather than
> two methods.
>
> A couple comments about library_call.cpp: There are two places where
> mod_val is computed as num_elem-1 and then applied with Op_AndI. The
> intrinsic must check is_power_of_2(num_elem) and fail the intrinsic if
> that happens. This may happen with ARM SVE—in theory.
>
> Also, if step is the constant 1 then there is no need to perform the
> Op_MulI or to build the VectorBlendNode which handles overflow at the
> end; all indexes will be in range. This case should be covered even
> though the Java code attempts to cover it also. The optimizer might
> discover a constant value of
> step=1 that the Java code misses.
>
> There might be a couple kinds of technical debt here: First, we want
> to strength-reduce Op_MulI to Op_LShiftI (or nop) when possible. Does
> this happen downstream of the library_call.cpp code? Second, we want
> to keep track of what happens when the multiplication overflows
> because step is large, and/or the lane size is small. That behavior
> has to be specified and followed both in Java code and the
> intrinsic—and tested. (Also, the VectorBlendNode should nop itself
> away if the step turns out to be constant 1 after parsing. Not sure
> how hard this is.)
>
> Reviewed.
>
> — John
More information about the panama-dev
mailing list