RFR: 8310886: C2 SuperWord: Two nodes should be isomorphic if they are loop invariant but pinned at different nodes outside the loop [v4]
Emanuel Peter
epeter at openjdk.org
Fri Sep 22 07:44:14 UTC 2023
On Thu, 21 Sep 2023 13:53:26 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:
>> When currently hoisting a range check out of a loop in Loop Predication, we create two Hoisted Check Predicate (one for the init value and one for the last iteration value) together with two Template Assertion Predicate (one for the init value and one for the last iteration value). Any data dependency on a hoisted range check (for which `depends_only_on_test()` is true) is rewired to the second Template Assertion Predicate:
>>
>> 
>> - `487 RangeCheck` + `496 RangeCheck`: Hoisted Check Predicate
>> - `405 If` + `416 Range Check`: Template Assertion Predicate
>> - `218 LoadI` rewired data dependency from hoisted range check inside the loop
>>
>> When creating pre/main/post loops, we rewire all data dependencies found at Template Assertion Predicates which are part of the original loop body (and therefore cloned and part of the pre, main, and post loop) to the main and post loop. Currently, they all end up at the zero trip guard of the main loop and post loop (image shows the main loop zero trip guard):
>>
>> 
>>
>> When further unrolling the main loop, all cloned data dependencies stay at the zero trip guard. For example, after unrolling three times, we have all the cloned loads still at the zero trip guard:
>>
>> 
>>
>> The Java code for the graphs above is:
>>
>> static void test(int[] a, int[] b) {
>> for (int i = 0; i < 1024; i+=2) {
>> a[i+0] += b[i+0];
>> a[i+1] += b[i+1];
>> }
>> }
>>
>> We can apply Superword and vectorize the loads.
>>
>> However, if we ever turn a main loop (before Superword) into a normal loop again and split it further (peel, unswitch, pre/main/post etc.). We would need to create new Assertion Predicates for the split loops and rewire the data dependencies to them. Both is currently not done and known to lead to failures. This should be fixed with JDK-8288981.
>>
>> In JDK-8288981, I introduce a new `TemplateAssertionPredicateNodes` which replaces the two Template Assertion Predicates with their `Opaque4Nodes`. This means that for each hoisted range check, we create a single `TemplateAssertionPredicateNode`:
>>
>> 
>>
>> For the Java ...
>
> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision:
>
> Restrict test to aarch64 and x86_64 since match rule for Op_MulAddVS2VI is only defined on these platforms
Looks good, I just had a few minor comments.
test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 917:
> 915: public static final String MUL_ADD_VS2VI = PREFIX + "MUL_ADD_VS2VI" + POSTFIX;
> 916: static {
> 917: superWordNodes(MUL_ADD_VS2VI, "MulAddVS2VI");
Is there a reason this is not a `vectorNode`?
test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java line 50:
> 48: for (int i = 0; i < RANGE; i++) {
> 49: sArr1[i] = (short)(i + (1000 * AbstractInfo.getRandom().nextDouble()));
> 50: sArr2[i] = (short)(RANGE - i + (2000 * AbstractInfo.getRandom().nextDouble()));
Did you want to change this to `(short)AbstractInfo.getRandom().nextInt();`
test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java line 83:
> 81: failOn = {IRNode.MUL_ADD_VS2VI}, // Can only pack LoadS if UseUnalignedLoadStores is true (default if sse4.2)
> 82: counts = {IRNode.MUL_ADD_S2I, "> 0"})
> 83: @IR(applyIfCPUFeature = {"asimd", "true"}, applyIf = {"MaxVectorSize", "16"},
Why `"MaxVectorSize", "16"`?
-------------
Marked as reviewed by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/15842#pullrequestreview-1639333472
PR Review Comment: https://git.openjdk.org/jdk/pull/15842#discussion_r1334004038
PR Review Comment: https://git.openjdk.org/jdk/pull/15842#discussion_r1334004976
PR Review Comment: https://git.openjdk.org/jdk/pull/15842#discussion_r1334006239
More information about the hotspot-compiler-dev
mailing list