RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5]
Emanuel Peter
epeter at openjdk.org
Fri May 12 06:47:52 UTC 2023
On Fri, 12 May 2023 01:10:29 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>>
>> use is_counted and is_innermost
>
> src/hotspot/share/opto/loopopts.cpp line 4210:
>
>> 4208: if (use != phi && ctrl_or_self(use) == cl) {
>> 4209: DEBUG_ONLY( current->dump(-1); )
>> 4210: assert(false, "reduction has use inside loop");
>
> I have been wondering, it is right to bailout here from the optimization but why do we assert here? It is perfectly legal (if not very meaningful) to have a scalar use of the last unordered reduction within the loop. This will still auto vectorize as the reduction is to a scalar. e.g. a slight modification of the SumRed_Int.java still auto vectorizes and has a use of the last unordered reduction within the loop:
> public static int sumReductionImplement(
> int[] a,
> int[] b,
> int[] c,
> int total) {
> int sum = 0;
> for (int i = 0; i < a.length; i++) {
> total += (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]);
> sum = total + i;
> }
> return total + sum;
> }
> Do you think this is a valid concern?
I agree, the assert is not very necessary, but I'd rather have an assert more in there and figure out what cases I missed when the fuzzer eventually finds a case. But if it is wished I can also just remove that assert.
I wrote this `Test.java`:
class Test {
static final int RANGE = 1024;
static final int ITER = 10_000;
static void init(int[] data) {
for (int i = 0; i < RANGE; i++) {
data[i] = i + 1;
}
}
static int test(int[] data, int sum) {
int x = 0;
for (int i = 0; i < RANGE; i++) {
sum += 11 * data[i];
x = sum & i; // what happens with this AndI ?
}
return sum + x;
}
public static void main(String[] args) {
int[] data = new int[RANGE];
init(data);
for (int i = 0; i < ITER; i++) {
test(data, i);
}
}
}
And ran it like this, with my patch:
./java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:+TraceNewVectors -XX:+TraceSuperWord Test.java
Everything vectorized as usual. But what happens with the `AndI`? It actually drops outside the loop. Its left input is the `AddReductionVI`, and the right input is `(Phi #tripcount) + 63` (the last `i` thus already drops outside the loop).
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1191971362
More information about the hotspot-compiler-dev
mailing list