RFR: 8310886: C2 SuperWord: Two nodes should be isomorphic if they are loop invariant but pinned at different nodes outside the loop
Christian Hagedorn
chagedorn at openjdk.org
Wed Sep 20 13:36:16 UTC 2023
When currently hoisting a range check out of a loop in Loop Predication, we create two Hoisted Check Predicate (one for the init value and one for the last iteration value) together with two Template Assertion Predicate (one for the init value and one for the last iteration value). Any data dependency on a hoisted range check (for which `depends_only_on_test()` is true) is rewired to the second Template Assertion Predicate:

- `487 RangeCheck` + `496 RangeCheck`: Hoisted Check Predicate
- `405 If` + `416 Range Check`: Template Assertion Predicate
- `218 LoadI` rewired data dependency from hoisted range check inside the loop
When creating pre/main/post loops, we rewire all data dependencies found at Template Assertion Predicates which are part of the original loop body (and therefore cloned and part of the pre, main, and post loop) to the main and post loop. Currently, they all end up at the zero trip guard of the main loop and post loop (image shows the main loop zero trip guard):

When further unrolling the main loop, all cloned data dependencies stay at the zero trip guard. For example, after unrolling three times, we have all the cloned loads still at the zero trip guard:

The Java code for the graphs above is:
static void test(int[] a, int[] b) {
for (int i = 0; i < 1024; i+=2) {
a[i+0] += b[i+0];
a[i+1] += b[i+1];
}
}
We can apply Superword and vectorize the loads.
However, if we ever turn a main loop (before Superword) into a normal loop again and split it further (peel, unswitch, pre/main/post etc.). We would need to create new Assertion Predicates for the split loops and rewire the data dependencies to them. Both is currently not done and known to lead to failures. This should be fixed with JDK-8288981.
In JDK-8288981, I introduce a new `TemplateAssertionPredicateNodes` which replaces the two Template Assertion Predicates with their `Opaque4Nodes`. This means that for each hoisted range check, we create a single `TemplateAssertionPredicateNode`:

For the Java code in this example, we hoist four range checks and therefore create four `TemplateAssertionPredicateNodes`.
To fix the data dependency problem when splitting a loop, I keep the data dependencies at the cloned `TemplateAssertionPredicateNodes`. For example, the graph of this example with four hoisted checks looks like this after unrolling three times:

We've kept the data dependencies of all four hoisted range checks at their `TemplateAssertionPredicateNodes` instead of collecting them at the zero trip guard.
But this now prevents Superword from vectorizing these nodes because the `isomorphic()` check of `a[i+0]` and `a[i+1]` fails: They do not have the same control node - even though both are pinned outside the loop. There is some special logic inside `isomorphic()`, though, which still allows the check to pass if both are pinned at range checks (not the case here because they are pinned at `TemplateAssertionPredicateNodes`):
https://github.com/openjdk/jdk/blob/242eeaea47a259cab4ad2d4f0e055959e9870b8d/src/hotspot/share/opto/superword.cpp#L1280-L1292
or if a `MulAddS2I` node is involved (also not the case):
https://github.com/openjdk/jdk/blob/242eeaea47a259cab4ad2d4f0e055959e9870b8d/src/hotspot/share/opto/superword.cpp#L1293-L1305
Since both fail, we conclude that the loads are not isomorphic and we fail to vectorize the loads.
This additional special logic seems odd. We should always be able to vectorize pinned nodes as long as they are loop invariant (i.e. pinned outside the loop). I therefore propose to change this logic to only check for loop invariance, regardless of any pins. Additionally, Superword should be strong enough to prevent vectorization if there is problem with data dependencies inside and among the packs.
I've added an IR test to make sure that `MulAddS2I` nodes are still vectorized.
Thanks,
Christian
-------------
Commit messages:
- 8310886: C2 SuperWord: Two nodes should be isomorphic if they are loop invariant but pinned at different nodes outside the loop
Changes: https://git.openjdk.org/jdk/pull/15842/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15842&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8310886
Stats: 142 lines in 3 files changed: 106 ins; 29 del; 7 mod
Patch: https://git.openjdk.org/jdk/pull/15842.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/15842/head:pull/15842
PR: https://git.openjdk.org/jdk/pull/15842
More information about the hotspot-compiler-dev
mailing list