RFR: 8312980: C2: "malformed control flow" created during incremental inlining [v4]

Tue Oct 10 09:12:01 UTC 2023

On Wed, 4 Oct 2023 12:38:25 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> When `testHelper()` is late inlined, `o` is recorded in the replaced
>> nodes because of the cast to `B` in that method. When late inlining
>> finishes, c2 tries to replace `o` with the resulting `CheckCastPP` to
>> `B` for uses dominated by the call. ``test` and `testHelper2` have 2
>> type checks for A: there are 2 `CheckCastPP` nodes pinned at 2
>> different controls (one is dominated by the `testHelper()` call, one
>> is not), a single `CmpP` that's shared by the 2 type checks (one of
>> the type check is dominated, the other is not). To check if a
>> replacement is legal, for each use of the node to be replaced, the
>> code in `ReplacedNodes::apply()` follow uses and uses of uses until it
>> finds a node that's a CFG node or is pinned. It then uses the IGVN
>> heuristics to figure out if the CFG is dominated by the call or
>> not. If it finds a CFG node that is not dominated, then that use is
>> skipped.
>> 
>> What happens here is that `ReplacedNodes::apply()` checks the control
>> input for each `CheckCastPP` and finds one to be dominated and the
>> other not. So it performs the replacement only for the one that's
>> dominated. For the shared `CmpP`, it follows uses until it reaches the
>> `If` nodes of the type checks. One is dominated. The other is not. So
>> it declares `CmpP` to not be dominated and skips it.
>> 
>> When IGVN runs next, the `CheckCastPP` that had its input replaced now
>> casts a `B` to a `A` which results to top but the check that `o` is of
>> type `A` that guards the type check still tests that `o` of type
>> `Object` is a `B`. That check doesn't constant fold. Replaced nodes
>> introduced an inconsistency. What we would have needed, is for both
>> the `CmpP` and `CheckCastPP` inputs to be changed. But that wasn't
>> possible because the `CmpP` is shared.
>> 
>> The fix I propose is a tweak to `ReplacedNodes::apply()` so it goes
>> depth first over the use of the node to be replaced (let's call it N)
>> and the use of its uses. When it hits a node that's pinned or a CFG,
>> it checks if it is dominated. If that is the case, the chain of use
>> that leads from N is cloned and the clone of the use of N gets the
>> improved input. That way if a node on that path is shared and used in
>> some CFG path that's not dominated, it is unaffected. We then let IGVN
>> cleans up extraneous clones.
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review

Hi @rwestrel  thanks for analyzing this bug!

If I understand your algo correctly, you follow down the uses over all paths. But it is not a DFS, you might visit nodes multiple times if there are multiple paths to it. Can there be multiple paths? And if they fork and join many times, there could be an exponential explosion of paths.

If you find some use that is dominated by `ctl`, then you clone the path. You do this again and again, possibly with many paths. You possibly clone nodes multiple times if there is overlap between the paths.

So what exactly happens if there are 2 paths to a node that is dominated by `ctl`? Are eventually all inputs properly and consistently replaced? Can you help me get convinced here? Does it somehow rely on IGVN to common come cloned nodes again?

**An alternative**: First do a down-BFS starting at all replaced nodes, and find all candidate nodes for cloning. The boundary of the down-BFS is at Phi, CFG or pinned nodes, but goes through all other nodes. You mark the boundary nodes that need their inputs cloned.
Now you do a up-BFS starting at these marked nodes through the candidate nodes only, to find  which nodes you actually need to clone.
This way, you only need 2 BFS, the cost is cleanly limited. And I feel like correctness would be easier to understand/prove.

src/hotspot/share/opto/replacednodes.cpp line 239:

> 237:     // limit search depth
> 238:     if (depth >= 100 || n == nullptr) {
> 239:       return false;

How did you chose the value `100`? Could your bug be reproduced with a much more complicated case this way?
I imagine just adding 100 diamonds with some ifs.

-------------

PR Review: https://git.openjdk.org/jdk/pull/15905#pullrequestreview-1666869070
PR Review Comment: https://git.openjdk.org/jdk/pull/15905#discussion_r1351889505