RFR: 8312980: C2: "malformed control flow" created during incremental inlining [v4]

Wed Oct 11 09:19:15 UTC 2023

On Tue, 10 Oct 2023 09:47:20 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> Hi @rwestrel  thanks for analyzing this bug!
>> 
>> If I understand your algo correctly, you follow down the uses over all paths. But it is not a DFS, you might visit nodes multiple times if there are multiple paths to it. Can there be multiple paths? And if they fork and join many times, there could be an exponential explosion of paths.
>> 
>> If you find some use that is dominated by `ctl`, then you clone the path. You do this again and again, possibly with many paths. You possibly clone nodes multiple times if there is overlap between the paths.
>> 
>> So what exactly happens if there are 2 paths to a node that is dominated by `ctl`? Are eventually all inputs properly and consistently replaced? Can you help me get convinced here? Does it somehow rely on IGVN to common some cloned nodes again?
>> 
>> **An alternative**: First do a down-BFS starting at all replaced nodes, and find all candidate nodes for cloning. The boundary of the down-BFS is at Phi, CFG or pinned nodes, but goes through all other nodes. You mark the boundary nodes that need their inputs cloned.
>> Now you do a up-BFS starting at these marked nodes through the candidate nodes only, to find  which nodes you actually need to clone.
>> This way, you only need 2 BFS, the cost is cleanly limited. And I feel like correctness would be easier to understand/prove.
>
>> If I understand your algo correctly, you follow down the uses over all paths. But it is not a DFS, you might visit nodes multiple times if there are multiple paths to it. Can there be multiple paths? And if they fork and join many times, there could be an exponential explosion of paths.
> 
> Yes, that's right.
> 
>> If you find some use that is dominated by `ctl`, then you clone the path. You do this again and again, possibly with many paths. You possibly clone nodes multiple times if there is overlap between the paths.
> 
> That's correct.
> 
>> So what exactly happens if there are 2 paths to a node that is dominated by `ctl`? Are eventually all inputs properly and consistently replaced? Can you help me get convinced here? Does it somehow rely on IGVN to common some cloned nodes again?
> 
> It does rely on igvn. The traversal stops at CFG nodes or pinned nodes. All nodes in between are freely floating nodes. Most will be commoned. There might be some exceptions (`SubTypeCheck` nodes wouldn't be commoned by igvn) but they are harmless as none of these nodes have side effects. 
> 
>> **An alternative**: First do a down-BFS starting at all replaced nodes, and find all candidate nodes for cloning. The boundary of the down-BFS is at Phi, CFG or pinned nodes, but goes through all other nodes. You mark the boundary nodes that need their inputs cloned. Now you do a up-BFS starting at these marked nodes through the candidate nodes only, to find which nodes you actually need to clone. This way, you only need 2 BFS, the cost is cleanly limited. And I feel like correctness would be easier to understand/prove.
> 
> The classical c2 way would be to keep a map of old and cloned nodes to avoid duplicates. But unless the current implementation is demonstrated to be too expensive I don't see the need to make it more complicated. I don't believe there's a correctness issue given all nodes with a side effect should be pinned so no node with a side effect will be duplicated.

@rwestrel It took me a day, but I found an example:

./java  -XX:+AlwaysIncrementalInline -XX:CompileCommand=printcompilation,Test::* -XX:CompileCommand=compileonly,Test::* -XX:+PrintInlining -Xbatch Test.java

public class Test {
    public static void main(String[] args) {
        for (int i = 0; i < 20_000; i++) {
            test("" + i);
        }
    }

    public static int test(String s) {
        int result = 0;
        int len = s.length();
        int i = 0;
        while (i < len) {
            // charAt is inlined late, and i is constrained by CastII(i >= 0)
            // The constraint comes from intrinsic checkIndex
            s.charAt(i);
            // Graph below intentionally branches out 4x, and merges again (4-fold diamonds).
            // This creates an exponential explosion in number of paths.
            int e = i;
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            // Comment out lines below to make it not assert
            // assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            e = (e & 7) + (e & 31) + (e & 1111) + (e & 1000_000);
            result += e;
            i++;
        }
        return result;
    }
}

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/oracle-work/jdk-fork2/open/src/hotspot/share/opto/node.cpp:78), pid=2027031, tid=2027045
#  assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15905#issuecomment-1757232588