RFR: 8286125: C2: "bad AD file" with PopulateIndex on x86_64 [v2]

Tue May 10 08:44:49 UTC 2022

On Tue, 10 May 2022 07:59:12 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>> Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
>> 
>>  - Merge branch 'master' into slpfix
>>  - 8286125: C2: "bad AD file" with PopulateIndex on x86_64
>>    
>>    A fuzzer test reports an assertion failure issue with PopulateIndexNode
>>    on x86_64. It can be reproduced by the new jtreg case inside this patch.
>>    Root cause is that C2 superword creates a PopulateIndexNode by mistake
>>    while vectorizing below loop.
>>    
>>        for (int i = 304; i > 15; i -= 3) {
>>            int c = 16;
>>            do {
>>                for (int t = 1; t < 1; t++) {}
>>                arr[c + 1] >>= i;
>>            } while (--c > 0);
>>        }
>>    
>>    This is a corner loop case with redundant code inside. After several C2
>>    optimizations, the do-while loop inside is unrolled and then isomorphic
>>    right shift statements can be combined in the superword optimization.
>>    Since all shift counts are the same loop IV value `i`, superword should
>>    generate a RShiftCntVNode to create a vector of scalar replications of
>>    the loop IV. But after JDK-8280510, a PopulateIndexNode is generated by
>>    mistake because of the `opd == iv()` condition.
>>    
>>    To fix this, we add a `have_same_inputs` condition here checking if all
>>    inputs at position `opd_idx` of nodes in the pack are the same. If true,
>>    C2 code should NOT run into this block to generate a PopulateIndexNode.
>>    Instead, it should run into the next block for scalar replications.
>>    
>>    Additionally, only adding this condition here is still not good enough
>>    because it breaks the experimental post loop vectorization. As in post
>>    loops, all packs are singleton, i.e., `have_same_inputs` is always true.
>>    Hence, we also add a pack size check here to make post loop logic run
>>    into this block. It's safe to let it go because post loop never needs
>>    scalar replications of the loop IV - it never combines nodes in packs.
>>    
>>    We also add two more assertions in the code.
>>    
>>    Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
>>    are tested and no issue is found.
>
>> As in post loops, all packs are singleton, i.e., have_same_inputs is always true.
> Hence, we also add a pack size check here to make post loop logic run
> into this block. It's safe to let it go because post loop never needs
> scalar replications of the loop IV - it never combines nodes in packs.
> 
> I don't understand why we need to execute that code for post loops. Could you elaborate why the `p->size() == 1` check is needed?

Hi @TobiHartmann ,

Thanks for your review.

> I don't understand why we need to execute that code for post loops. Could you elaborate why the `p->size() == 1` check is needed?

In post loop vectorization, superword doesn't combine multiple scalar nodes to one vector node. Instead, each scalar node is transformed to a vector node with vector mask. If a loop body statement has the induction variable involved, such as

for (int i = 0; i < length; i++) {
    a[i] = i;
}

then superword should run into this `if (opd == iv() && ...) {...}` block to create an index vector for vectorizing post loop. But packs in post loops are special (singletons) - each has only one scalar node inside. So `same_inputs(...)` always returns true. If we just add the `have_same_inputs` condition, post loop vectorization will generate incorrect result because index vectors cannot be created. Hence we add another `p->size() == 1` check here.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8587