RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2]
Roberto Castañeda Lozano
rcastanedalo at openjdk.org
Wed May 31 07:27:04 UTC 2023
On Wed, 17 May 2023 14:15:09 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Roberto Castañeda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision:
>>
>> - Merge branch 'master' into JDK-8302673
>> - Defer op(x, x) to constant/identity propagation early
>> - Merge branch 'master' into JDK-8302673
>> - Refactor idealization and extracted Identity transformation for clarity
>> - Make auxiliary add operand extraction function return a tuple
>> - Randomize array values in min/max test computation
>> - Merge branch 'master' into JDK-8302673
>> - Merge branch 'master' into JDK-8302673
>> - Refine comments
>> - Update copyright header
>> - ... and 12 more: https://git.openjdk.org/jdk/compare/93a4b339...a6db3cc4
>
> src/hotspot/share/opto/addnode.cpp line 1132:
>
>> 1130: // four possible permutations given by opcode's commutativity) into
>> 1131: // opcode(x + opcode(x_off, y_off), z), where opcode is either MinI or MaxI,
>> 1132: // if x == y and the additions can't overflow.
>
> Ok, effectively we have 5, not just 4 cases here:
>
> opcode(x + x_off, opcode(y + y_off, z))
> opcode(x + x_off, opcode(z, y + y_off))
> opcode(opcode(y + y_off, z), x + x_off)
> opcode(opcode(z, y + y_off), x + x_off)
> opcode(x + x_off, y + y_off)
>
>
> I find the nested for-loop quite confusing. Maybe packing the inner stuff into a separate function could work?
>
>
> // Check for opcode(x + x_con, y + y_con), no z
> if (in(1)->Opcode() == Op_AddI && in(2)->Opcode() == Op_AddI) {
> Node* ret = try_fold(opcode, in(1), in(2), nullptr);
> if (ret != nullptr) { return ret; }
> }
>
> // Check for these 4 cases, equivalent to opcode3(addx, addy, z)
> // opcode(x + x_con, opcode(y + y_con, z))
> // opcode(x + x_con, opcode(z, y + y_con))
> // opcode(opcode(y + y_con, z), x + x_con)
> // opcode(opcode(z, y + y_con), x + x_con)
> for (uint i = 1; i < 2; i++) {
> Node* addx = in(i);
> Node* other = in(i == 1 ? 2 : 1); // or just "2-i"
> if (addx->Opcode() != Op_AddI || other->Opcode() != opcode) { continue; }
> for (uint j = 1; i < 2; j++) {
> Node* addy = other->in(j);
> Node* z = other->in(j == 1 ? 2 : 1);
> if (addy->Opcode() != Op_AddI) { continue; }
> // We have opcode3(addx, addy, z)
> Node* ret = try_fold(opcode, addx, addy, z);
> if (ret != nullptr) { return ret; }
> }
> }
>
> Where we have
>
> Node* try_fold(int opcode, Node* addx, Node* addy, Node* z = nullptr) {
> jint addx_con = 0;
> jint addy_con = 0;
> Node* addx_var = as_add_constant(addx, &addx_con);
> Node* addy_var = as_add_constant(addy, &addy_con);
> if (addx_var == nullptr || addy_var == nullptr) {
> // found a top
> return nullptr;
> }
> // could even check addx_var != addy_var, then we don't have to do that inside...
> Node* folded = extract_addition(phase, addx_var, addx_con, addy_var, addy_con, opcode);
> if (z != nullptr) {
> folded = opcode(folded, z);
> }
> return folded;
> }
>
>
> Maybe this does a few more calls to `as_add_constant` than strictly necessary, but it is a bit easier to understand, right?
Thanks for the feedback, I agree the first refactor was not as clear as it should be, possibly because I derived it quite mechanically from the existing code. I have reworked `MaxNode::IdealI()` using more explicit naming, a functional and simplified version of `as_add_with_constant()`, and a clearer separation of the different optimizations that are applied. Hope it is more readable now.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1211204760
More information about the hotspot-compiler-dev
mailing list