[vector-api][X86] Intrinsic creation for missing cases of iotaShuffle API
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Thu Nov 14 13:36:37 UTC 2019
> I don’t think any strength reduction is happening today on vector nodes after generation.
Yes, that's my recollection as well. (Probably, the need for peephole
optimizations on vector nodes was less pressing for auto-vectorizer: it
benefits from optimizations on scalar code.)
However, there were some experiments in that direction:
https://bugs.openjdk.java.net/browse/JDK-8219881
Best regards,
Vladimir Ivanov
> -----Original Message-----
> From: panama-dev <panama-dev-bounces at openjdk.java.net> On Behalf Of John Rose
> Sent: Wednesday, November 13, 2019 11:37 AM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> Cc: panama-dev at openjdk.java.net
> Subject: Re: [vector-api][X86] Intrinsic creation for missing cases of iotaShuffle API
>
> On Nov 13, 2019, at 7:53 AM, Bhateja, Jatin <jatin.bhateja at intel.com> wrote:
>>
>> Hi All,
>>
>> Please find below a patch which intrinsifies following missing cases
>> of iotaShuffle :-
>>
>> 1. scale > 1.
>> 2. wrap = false.
>>
>> http://cr.openjdk.java.net/~jbhateja/vectorIntrinsics/IotaShuffle/webr
>> ev.00/
>
> Thank you Jatin for fixing some of my “FIXME” comments!
>
> In AbstractSpecies.java there seems to be a funny 2-space indentation.
> Please align it to the ambient 4-space convention.
>
> In loopTransform.cpp I would encourage you to make your change more incremental.
> Your change re-orders the existing predicate terms while inserting the new term at the beginning. It is more conventional to add new logic at the end, while retaining the flow of the old logic. I think breaking the lines *is* a good idea, so something like this is good:
>
> - if (opc == Op_StoreP || opc == Op_StoreN || opc == Op_StoreNKlass || opc == Op_StoreCM) {
> +- if (opc == Op_StoreVector || opc == Op_StoreP ||
> ++ if (opc == Op_StoreP ||
> + opc == Op_StoreN || opc == Op_StoreNKlass ||
> +- opc == Op_StoreCM) {
> ++ opc == Op_StoreCM || opc == Op_StoreVector) {
>
> My biggest comment (which is still pretty minor) is about VectorIntrinsics.java.
>
> The handling of arguments in the intrinsic is irregular and I think this is the right moment to make it align better to the other intrinsics. Specifically, the arguments to the lambda should precede the lambda, in the same order as they are passed to the lambda.
>
> This means that the lambda should take the following form:
>
> interface ShuffleIotaOperation<E> {
>
> - VectorShuffle<E> apply(int step, int length);
> +- VectorShuffle<E> apply(int start, int step, int length);
> ++ VectorShuffle<E> apply(int length, int start, int step, int
> ++ wrap);
>
> }
>
> I don’t know why length and start got swapped in the first place; now is the time to fix it. Also, wrap should be present to the lambda. I suggest making wrap a true boolean; there’s no reason to lower it to an int, and that loses information about the range of the value.
>
> You can still use two lambdas if you want for the fallback, one for each value of wrap. But I suspect using one would be cleaner.
>
> (I’ll defer to Vladimir on this; he may have more information about what’s right here.)
>
> The above adjustments might make it convenient to simplify the AbstractVector API to have just one method with a boolean rather than two methods.
>
> A couple comments about library_call.cpp: There are two places where mod_val is computed as num_elem-1 and then applied with Op_AndI. The intrinsic must check is_power_of_2(num_elem) and fail the intrinsic if that happens. This may happen with ARM SVE—in theory.
>
> Also, if step is the constant 1 then there is no need to perform the Op_MulI or to build the VectorBlendNode which handles overflow at the end; all indexes will be in range. This case should be covered even though the Java code attempts to cover it also. The optimizer might discover a constant value of
> step=1 that the Java code misses.
>
> There might be a couple kinds of technical debt here: First, we want to strength-reduce Op_MulI to Op_LShiftI (or nop) when possible. Does this happen downstream of the library_call.cpp code? Second, we want to keep track of what happens when the multiplication overflows because step is large, and/or the lane size is small. That behavior has to be specified and followed both in Java code and the intrinsic—and tested. (Also, the VectorBlendNode should nop itself away if the step turns out to be constant 1 after parsing. Not sure how hard this is.)
>
> Reviewed.
>
> — John
>
More information about the panama-dev
mailing list