RFR: 8286125: C2: "bad AD file" with PopulateIndex on x86_64 [v2]
Pengfei Li
pli at openjdk.java.net
Tue May 10 08:44:49 UTC 2022
On Tue, 10 May 2022 07:59:12 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:
>> Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
>>
>> - Merge branch 'master' into slpfix
>> - 8286125: C2: "bad AD file" with PopulateIndex on x86_64
>>
>> A fuzzer test reports an assertion failure issue with PopulateIndexNode
>> on x86_64. It can be reproduced by the new jtreg case inside this patch.
>> Root cause is that C2 superword creates a PopulateIndexNode by mistake
>> while vectorizing below loop.
>>
>> for (int i = 304; i > 15; i -= 3) {
>> int c = 16;
>> do {
>> for (int t = 1; t < 1; t++) {}
>> arr[c + 1] >>= i;
>> } while (--c > 0);
>> }
>>
>> This is a corner loop case with redundant code inside. After several C2
>> optimizations, the do-while loop inside is unrolled and then isomorphic
>> right shift statements can be combined in the superword optimization.
>> Since all shift counts are the same loop IV value `i`, superword should
>> generate a RShiftCntVNode to create a vector of scalar replications of
>> the loop IV. But after JDK-8280510, a PopulateIndexNode is generated by
>> mistake because of the `opd == iv()` condition.
>>
>> To fix this, we add a `have_same_inputs` condition here checking if all
>> inputs at position `opd_idx` of nodes in the pack are the same. If true,
>> C2 code should NOT run into this block to generate a PopulateIndexNode.
>> Instead, it should run into the next block for scalar replications.
>>
>> Additionally, only adding this condition here is still not good enough
>> because it breaks the experimental post loop vectorization. As in post
>> loops, all packs are singleton, i.e., `have_same_inputs` is always true.
>> Hence, we also add a pack size check here to make post loop logic run
>> into this block. It's safe to let it go because post loop never needs
>> scalar replications of the loop IV - it never combines nodes in packs.
>>
>> We also add two more assertions in the code.
>>
>> Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
>> are tested and no issue is found.
>
>> As in post loops, all packs are singleton, i.e., have_same_inputs is always true.
> Hence, we also add a pack size check here to make post loop logic run
> into this block. It's safe to let it go because post loop never needs
> scalar replications of the loop IV - it never combines nodes in packs.
>
> I don't understand why we need to execute that code for post loops. Could you elaborate why the `p->size() == 1` check is needed?
Hi @TobiHartmann ,
Thanks for your review.
> I don't understand why we need to execute that code for post loops. Could you elaborate why the `p->size() == 1` check is needed?
In post loop vectorization, superword doesn't combine multiple scalar nodes to one vector node. Instead, each scalar node is transformed to a vector node with vector mask. If a loop body statement has the induction variable involved, such as
for (int i = 0; i < length; i++) {
a[i] = i;
}
then superword should run into this `if (opd == iv() && ...) {...}` block to create an index vector for vectorizing post loop. But packs in post loops are special (singletons) - each has only one scalar node inside. So `same_inputs(...)` always returns true. If we just add the `have_same_inputs` condition, post loop vectorization will generate incorrect result because index vectors cannot be created. Hence we add another `p->size() == 1` check here.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8587
More information about the hotspot-compiler-dev
mailing list