RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2]

Wed May 31 07:27:04 UTC 2023

On Wed, 17 May 2023 14:15:09 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Roberto Castañeda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-8302673
>>  - Defer op(x, x) to constant/identity propagation early
>>  - Merge branch 'master' into JDK-8302673
>>  - Refactor idealization and extracted Identity transformation for clarity
>>  - Make auxiliary add operand extraction function return a tuple
>>  - Randomize array values in min/max test computation
>>  - Merge branch 'master' into JDK-8302673
>>  - Merge branch 'master' into JDK-8302673
>>  - Refine comments
>>  - Update copyright header
>>  - ... and 12 more: https://git.openjdk.org/jdk/compare/93a4b339...a6db3cc4
>
> src/hotspot/share/opto/addnode.cpp line 1132:
> 
>> 1130:   // four possible permutations given by opcode's commutativity) into
>> 1131:   // opcode(x + opcode(x_off, y_off), z), where opcode is either MinI or MaxI,
>> 1132:   // if x == y and the additions can't overflow.
> 
> Ok, effectively we have 5, not just 4 cases here:
> 
> opcode(x + x_off, opcode(y + y_off, z))
> opcode(x + x_off, opcode(z, y + y_off))
> opcode(opcode(y + y_off, z), x + x_off)
> opcode(opcode(z, y + y_off), x + x_off)
> opcode(x + x_off, y + y_off)
> 
> 
> I find the nested for-loop quite confusing. Maybe packing the inner stuff into a separate function could work?
> 
> 
> // Check for opcode(x + x_con,  y + y_con), no z
> if (in(1)->Opcode() == Op_AddI && in(2)->Opcode() == Op_AddI) {
>   Node* ret = try_fold(opcode, in(1), in(2), nullptr);
>   if (ret != nullptr) { return ret; }
> }
> 
> // Check for these 4 cases, equivalent to opcode3(addx, addy, z)
> // opcode(x + x_con, opcode(y + y_con, z))
> // opcode(x + x_con, opcode(z, y + y_con))
> // opcode(opcode(y + y_con, z), x + x_con)
> // opcode(opcode(z, y + y_con), x + x_con)
> for (uint i = 1; i < 2; i++) {
>   Node* addx = in(i);
>   Node* other = in(i == 1 ? 2 : 1); // or just "2-i"
>   if (addx->Opcode() != Op_AddI || other->Opcode() != opcode) { continue; }
>   for (uint j = 1; i < 2; j++) {
>     Node* addy = other->in(j);
>     Node* z = other->in(j == 1 ? 2 : 1);
>     if (addy->Opcode() != Op_AddI) { continue; }
>     // We have opcode3(addx, addy, z)
>     Node* ret = try_fold(opcode, addx, addy, z);
>     if (ret != nullptr) { return ret; }
>   }
> }
> 
> Where we have
> 
> Node* try_fold(int opcode, Node* addx, Node* addy, Node* z = nullptr) {
>   jint addx_con = 0;
>   jint addy_con = 0;
>   Node* addx_var = as_add_constant(addx, &addx_con);
>   Node* addy_var = as_add_constant(addy, &addy_con);
>   if (addx_var == nullptr || addy_var == nullptr) {
>     // found a top
>     return nullptr;
>   }
>   // could even check addx_var != addy_var, then we don't have to do that inside...
>   Node* folded = extract_addition(phase, addx_var, addx_con, addy_var, addy_con, opcode);
>   if (z != nullptr) {
>     folded = opcode(folded, z);
>   }
>   return folded;
> }
> 
> 
> Maybe this does a few more calls to `as_add_constant` than strictly necessary, but it is a bit easier to understand, right?

Thanks for the feedback, I agree the first refactor was not as clear as it should be, possibly because I derived it quite mechanically from the existing code. I have reworked `MaxNode::IdealI()` using more explicit naming, a functional and simplified version of `as_add_with_constant()`, and a clearer separation of the different optimizations that are applied. Hope it is more readable now.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1211204760