[lworld+vector] RFR: 8317699: [lworld+vector] Fix Vector API tests crash with "assert(vbox->is_Phi()) failed: should be phi"

Xiaohong Gong xgong at openjdk.org
Wed Oct 11 03:03:21 UTC 2023


On Tue, 10 Oct 2023 08:50:32 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

> Currently several jtreg tests under `test/jdk/jdk/jdk/incubator/vector` crash with following error:
> 
> 
> A fatal error has been detected by the Java Runtime Environment:
> 
> Internal Error (valhalla/src/hotspot/share/opto/vector.cpp:257), pid=2737779, tid=2737796
> assert(false) failed: should be phi
> 
> 
> It happens when expanding a `VectorBoxNode` which has following inputs:
> 
>                         VectorBoxAllocate
>                                |
>                                |
>        VectorBoxAllocate    [2 Proj]
>                |             /   |
>             [1 Proj]       /     |
>                \         /       |
>                  \     /        /
>                    \ /        /
>                  [2 Phi]    /
>                      \    /
>                        \ |
>                      [1 Phi]     InlineType
>                          \      /
>                            \  /
>                          VectorBox
> 
> The compiler will visit and expand all the inputs of the `PhiNode`, and mark them visited. If the input has been visited, it just returns without any action. And the compiler assumes the revisited node should be a PhiNode. In above graph, the first `Phi` (i.e. `[1 Phi]`) has the same input `[2 Proj]` with the second `Phi` (i.e. `[2 Phi]`). So `[2 Proj]` will be visited twice. At the second time it is visited, the assertion fails since it is not a Phi. The case is reasonable in java and the assertion can be removed.
> 
> Removing the assertion can fix the existing issue and no new issues are involved.
> 
> Jdk mainline also has the same issue, but I prefer fixing it here temporarily with following reasons:
>  1) Several jtreg tests crash on this branch while pass on jdk mainline. This blocks our testing and further development.
>  2) We can have more testing with this fix on this branch. And port it to jdk mainline once we think it's a mature/safe change.

> > > You mentioned about this being a problem in mainline also, can you kindly share a test reproducer for it, lworld+vector does diverg from mainline w.r.t VectorBoxNode which are now InlineTypeNode but VectorBoxAllocate is still the same. Pushing InlineTypeNode across phi is also done using _merge_through_phi_ routines.
> > 
> > 
> > Sure, I planed to add the test on jdk mainline once I push this change on that branch.
> 
> We do not see assertion failures in Vector API JTREG tests currently, which means above mentioned graph shape involving VectorBoxAllocate IR node only appears with lworld+vector for same test point. Was curious to know the root cause of this.

The failure case on `lworld+vector` is like `Byte128VectorTests.divByte128VectorTestsBroadcastSmokeTest()`. After I debug and dump the ideal graph, I found it happens when C2 compiler compiles method `ByteVector::lanewise()` https://github.com/openjdk/valhalla/blob/lworld%2Bvector/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L1019 which finally calls into https://github.com/openjdk/valhalla/blob/lworld%2Bvector/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L881.

As we can see, there is some `if-else` branches inside the java code. And the ideal graph matches this code as well, which just like what I printed in above comment.  I tried to write a similar small case like it, and the issue is reproduced. The main code just like following code which is also reproduceable on jdk mainline:

static final VectorSpecies<Byte> B_SPECIES = ByteVector.SPECIES_128;

private static ByteVector func2(ByteVector v, int i) {
        ByteVector v1 = v;
        if (i % 2 == 0 || i % 3 == 0) {
            if (i % 2 == 0) {
                v1 = v1.lanewise(VectorOperators.AND, (byte)5);
            }

            if (i % 8 == 0) {
                v1 = v;
            }
        }
        return v1;
}

public static ByteVector v2;

public static void main(String[] args) {
        ByteVector v = ByteVector.zero(B_SPECIES);

        for (int i = 0; i < 100000; i++) {
            for (int j = 0; j < LENGTH; j += B_SPECIES.length()) {
                ByteVector av = ByteVector.fromArray(B_SPECIES, ia, j);
                v = func2(av, i);
            }
        }

        v2 = v;
}


I guess the reason why the jtreg tests on jdk mainline cannot reproduce is: we added inlinetype on lworld+vector, and the java default code is more complex, which just makes C2 compiles the relative java code with not too much other optimizations like in jdk mainline. And the issue is exposed.

-------------

PR Comment: https://git.openjdk.org/valhalla/pull/937#issuecomment-1756674093



More information about the valhalla-dev mailing list