RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2]

Wed May 31 11:11:03 UTC 2023

On Wed, 31 May 2023 07:14:18 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:

>> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation:
>> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`.
>> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`.
>> 
>> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`:
>> 
>> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c)
>> 
>> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`:
>> 
>> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c)
>> 
>> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized.
>> 
>> #### Testing
>> 
>> ##### Functionality
>> 
>> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode).
>> 
>> ##### Performance
>> 
>> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed.
>
> Roberto Castañeda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8302673
>  - Defer op(x, x) to constant/identity propagation early
>  - Merge branch 'master' into JDK-8302673
>  - Refactor idealization and extracted Identity transformation for clarity
>  - Make auxiliary add operand extraction function return a tuple
>  - Randomize array values in min/max test computation
>  - Merge branch 'master' into JDK-8302673
>  - Merge branch 'master' into JDK-8302673
>  - Refine comments
>  - Update copyright header
>  - ... and 12 more: https://git.openjdk.org/jdk/compare/36eaed9c...a6db3cc4

src/hotspot/share/opto/addnode.cpp line 1141:

> 1139:   }
> 1140:   return ConstAddOperands(x, c_type->is_int()->get_con());
> 1141: }

This is what it was on my last review:

// Return:
// <x, C>,       if n is of the form x + C, where 'C' is a non-TOP constant;
// <nullptr, _>, if n is of the form x + C, where 'C' is a TOP constant;
// <n, con>      otherwise.
static Node* constant_add_input(Node* n, jint* con) {
  if (n->Opcode() == Op_AddI && n->in(2)->is_Con()) {
    const Type* t = n->in(2)->bottom_type();
    if (t == Type::TOP) {
      return nullptr;
    }
    *con = t->is_int()->get_con();
    n = n->in(1);
  }
  return n;
}

Here, you used to also allow packing just a single `n`, and leave the constant as `zero`. Did you remove this possibility on purpose? Now `n` must be an `AddI`.

This used to allow cases like this to be folded:
`max(max(a, b),  a + 1)      ->      max(a + max(0, 1), b)`

Or am I missing something? Do you have tests for this case?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1211536265