RFR: 8310886: C2 SuperWord: Two nodes should be isomorphic if they are loop invariant but pinned at different nodes outside the loop
Vladimir Kozlov
kvn at openjdk.org
Wed Sep 20 18:53:40 UTC 2023
On Wed, 20 Sep 2023 13:10:27 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:
> When currently hoisting a range check out of a loop in Loop Predication, we create two Hoisted Check Predicate (one for the init value and one for the last iteration value) together with two Template Assertion Predicate (one for the init value and one for the last iteration value). Any data dependency on a hoisted range check (for which `depends_only_on_test()` is true) is rewired to the second Template Assertion Predicate:
>
> 
> - `487 RangeCheck` + `496 RangeCheck`: Hoisted Check Predicate
> - `405 If` + `416 Range Check`: Template Assertion Predicate
> - `218 LoadI` rewired data dependency from hoisted range check inside the loop
>
> When creating pre/main/post loops, we rewire all data dependencies found at Template Assertion Predicates which are part of the original loop body (and therefore cloned and part of the pre, main, and post loop) to the main and post loop. Currently, they all end up at the zero trip guard of the main loop and post loop (image shows the main loop zero trip guard):
>
> 
>
> When further unrolling the main loop, all cloned data dependencies stay at the zero trip guard. For example, after unrolling three times, we have all the cloned loads still at the zero trip guard:
>
> 
>
> The Java code for the graphs above is:
>
> static void test(int[] a, int[] b) {
> for (int i = 0; i < 1024; i+=2) {
> a[i+0] += b[i+0];
> a[i+1] += b[i+1];
> }
> }
>
> We can apply Superword and vectorize the loads.
>
> However, if we ever turn a main loop (before Superword) into a normal loop again and split it further (peel, unswitch, pre/main/post etc.). We would need to create new Assertion Predicates for the split loops and rewire the data dependencies to them. Both is currently not done and known to lead to failures. This should be fixed with JDK-8288981.
>
> In JDK-8288981, I introduce a new `TemplateAssertionPredicateNodes` which replaces the two Template Assertion Predicates with their `Opaque4Nodes`. This means that for each hoisted range check, we create a single `TemplateAssertionPredicateNode`:
>
> 
>
> For the Java code in this example, we hoist four range checks and therefore cre...
Are you sure that VNNI instruction is used in your test? I don't see VNNI CPU feature check in the test.
Here is instruction:
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8744
-------------
PR Review: https://git.openjdk.org/jdk/pull/15842#pullrequestreview-1636300252
More information about the hotspot-compiler-dev
mailing list