RFR: 8310886: C2 SuperWord: Two nodes should be isomorphic if they are loop invariant but pinned at different nodes outside the loop [v5]
Vladimir Kozlov
kvn at openjdk.org
Wed Nov 8 18:42:04 UTC 2023
On Wed, 8 Nov 2023 09:57:08 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:
>> When currently hoisting a range check out of a loop in Loop Predication, we create two Hoisted Check Predicate (one for the init value and one for the last iteration value) together with two Template Assertion Predicate (one for the init value and one for the last iteration value). Any data dependency on a hoisted range check (for which `depends_only_on_test()` is true) is rewired to the second Template Assertion Predicate:
>>
>> 
>> - `487 RangeCheck` + `496 RangeCheck`: Hoisted Check Predicate
>> - `405 If` + `416 Range Check`: Template Assertion Predicate
>> - `218 LoadI` rewired data dependency from hoisted range check inside the loop
>>
>> When creating pre/main/post loops, we rewire all data dependencies found at Template Assertion Predicates which are part of the original loop body (and therefore cloned and part of the pre, main, and post loop) to the main and post loop. Currently, they all end up at the zero trip guard of the main loop and post loop (image shows the main loop zero trip guard):
>>
>> 
>>
>> When further unrolling the main loop, all cloned data dependencies stay at the zero trip guard. For example, after unrolling three times, we have all the cloned loads still at the zero trip guard:
>>
>> 
>>
>> The Java code for the graphs above is:
>>
>> static void test(int[] a, int[] b) {
>> for (int i = 0; i < 1024; i+=2) {
>> a[i+0] += b[i+0];
>> a[i+1] += b[i+1];
>> }
>> }
>>
>> We can apply Superword and vectorize the loads.
>>
>> However, if we ever turn a main loop (before Superword) into a normal loop again and split it further (peel, unswitch, pre/main/post etc.). We would need to create new Assertion Predicates for the split loops and rewire the data dependencies to them. Both is currently not done and known to lead to failures. This should be fixed with JDK-8288981.
>>
>> In JDK-8288981, I introduce a new `TemplateAssertionPredicateNodes` which replaces the two Template Assertion Predicates with their `Opaque4Nodes`. This means that for each hoisted range check, we create a single `TemplateAssertionPredicateNode`:
>>
>> 
>>
>> For the Java ...
>
> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
>
> - Change IRNode to use JDK-8310308 features
> - Merge branch 'master' into JDK-8310886
> - Restrict test to aarch64 and x86_64 since match rule for Op_MulAddVS2VI is only defined on these platforms
> - Only run test on 64-bit because match rule for Op_MulAddVS2VI is only defined on 64-bit
> - Add IR rule for VNNI instruction
> - 8310886: C2 SuperWord: Two nodes should be isomorphic if they are loop invariant but pinned at different nodes outside the loop
Last update is good.
-------------
Marked as reviewed by kvn (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/15842#pullrequestreview-1721038238
More information about the hotspot-compiler-dev
mailing list