RFR: 8280126: C2: detect and remove dead irreducible loops [v9]
Emanuel Peter
epeter at openjdk.org
Tue Jan 31 09:33:45 UTC 2023
> **Context**
> If a `LoopNode` loses entry control, we remove it, to prevent having a dead-loop (backedge would be only input to LoopNode):
> https://github.com/openjdk/jdk/blob/8c472e481676ed0ef475c4989477d5714880c59e/src/hotspot/share/opto/cfgnode.cpp#L541-L544
>
> We must remove such dead code, otherwise all sorts of bad graph patterns can be created, including self-referring Add nodes etc, and that would either hit asserts, or crash the VM.
>
> Also `PhiNode` does some checks to avoid creating a dead-loop:
> https://github.com/openjdk/jdk/blob/8c472e481676ed0ef475c4989477d5714880c59e/src/hotspot/share/opto/cfgnode.cpp#L2004-L2019
>
> However, all of this logic assumes that we have properly canonicalized reducible loops: every loop-head must be a `LoopNode`, where we have a loop-entry-control, and a backedge-control. Once the loop-entry-control dies, we know the loop is dead-code.
>
> **Problem**
>
> This dead-loop removal logic does not work for irreducible loops. I found many JASM examples, and even was able to produce a Java reproducer. I have seen these asserts triggered:
>
> Self-referencing data node:
> https://github.com/openjdk/jdk/blob/8c472e481676ed0ef475c4989477d5714880c59e/src/hotspot/share/opto/phaseX.cpp#L943
>
> We find dead-code CFG nodes:
> https://github.com/openjdk/jdk/blob/8c472e481676ed0ef475c4989477d5714880c59e/src/hotspot/share/opto/loopnode.cpp#L5293
>
> Dead loop of phi's without any input:
> https://github.com/openjdk/jdk/blob/8c472e481676ed0ef475c4989477d5714880c59e/src/hotspot/share/opto/cfgnode.cpp#L2539
>
> We must remove the loop once it loses its last entry-control. The problem is that a irreducible loop has multiple entries, by definition. Irreducible loops have no `LoopNodes`, they are simply `RegionNodes` with multiple inputs. If one of the controls is lost, it is a priori not clear if this was the last entry-control for the whole irreducible loop.
>
> **Solution Summary**
>
> We mark every `RegionNode` with one of these three:
>
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/cfgnode.hpp#L103-L112
>
> We remove irreducible loops like this:
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/cfgnode.cpp#L591-L603
>
> And during `PhaseIdealLoop::build_loop_tree` we verify that all irreducible loop-entries are marked as `MaybeIrreducibleEntry`:
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/loopnode.cpp#L5161-L5162
>
> Additionally, we can verify that no irreducible loop contains regions marked as `Reducible`:
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/loopnode.cpp#L5277-L5279
>
> **Solution Details**
>
> During parsing, we call `ciTypeFlow::Block::copy_irreducible_status_to(RegionNode* region, const JVMState* jvms)` for every `RegionNode` created for block-merges. This checks `ciTypeFlow::Block::is_in_irreducible_loop()`, to see if the relevant block of this region is in an irreducible loop (see "Alternative Solutions" below, why I mark all regions inside irreducible loops with `MaybeIrreducibleEntry`). For this to work, I had to slightly improve the irreducible loop detection in `ciTypeFlow`. One could improve the marking after every `PhaseIdealLoop::build_loop_tree`, but that comes at the cost of more compile-time, so I did not implement it.
>
> In some cases, new regions are created that need to be marked with `MaybeIrreducibleEntry`. For example `IdealLoopTree::split_fall_in` can split a irreducible loop head, after which both are irreducible loop entries:
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/loopnode.cpp#L3143-L3144
>
> Regions for which we do not explicitly set `MaybeIrreducibleEntry` are marked as `NeverIrreducibleEntry`. We do this for any new regions that are added, for example by the `GraphKit`. They all seem to be safe, as those regions can never become irreducible loop entries.
>
> I tried to mark as many regions as possible with `Reducible`, so that we can do stronger asserts. So no enclosing loop of a region is irreducible, we mark it `Reducible`, and assert that it will never be inside a irreducible loop. However, checking for an outher irreducible loop turns out to be tricky when we have inlining: regions in the inner method need to check if they are in a irreducible loop of the outer method. I tried to implement this, but found it to be too difficult (We would need to find the block in the outer method where the inlining of the inner method happens. An additional complication is that the method only stores the non-OSR ciTypeFlow, even if the outer method is OSR compiled - thus the irreducible status can be inaccurate). So I just mark the nodes as `NeverIrreducibleEntry` if they are not in an irreducible loop in the current loop, and there is an outer loop. This is safe (they can never be irreducible entries): the region would have to merge a "backedge" a
nd an "entry", both separately entering the inlined method, but there is only a single entry point to the inlined method.
>
> Also `PhiNodes` have to be handled more carefully, hence I block phi's in irreducible loops from acting on `TOP` inputs until the `Region` has a chance to react:
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/cfgnode.cpp#L2048-L2054
>
> When we detect a subgraph is a dead-loop, we remove it with
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/cfgnode.cpp#L797
> I had to refactor it a bit to be more agressive (first gather all dead nodes, only then remove them), which lead to the discovery of a few optimizations that could not deal with TOP inputs.
>
> It has been known that `split_flow_path` can cause new irreducible loops to appear [JDK-6742111](https://bugs.openjdk.org/browse/JDK-6742111). With my marking of regions and the verification of it, we cannot allow new irreducible loops to appear. However, from the testing and fuzzing I have performed, it seems that this can only happen when there is already another irreducible loop in the graph. Thus it seems sufficient to disable the optimization if there are already irreducible loops present:
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/cfgnode.cpp#L1841-L1853
>
> **Alternative Solutions**
>
> It is the long-term goal to remove irreducible loops from the graph, either by node-splitting or the dispatcher approach. So for now, we want a fix that works and is not too complex. If this fix turns out to be too slow, especially because of additional readability traversals, then we may need to revisit alternatives.
>
> A first and most brute-force approach would have been to simply do a reachability check for all regions, once an input control is lost. Or only do it if there is an irreducible loop anywhere in the graph. But that would clearly lead to some slowdown for OSR compilation, as they often have some irreducible loop in the graph. It is thus better to limit the number of nodes that need to check reachability.
>
> _Why did I not exclusively mark regions that are irreducible loop entries?_
> An entry of an irreducible loop can lose all internal edges ("backedges"), collapse, and float outside the loop. The entry is now further down the CFG from the old entry, possibly through a series of if/region. One could attempt to move the marking to the new entries, but that would be a complex task. An example can be found in regression test `test_009`.
>
> _Can we identify the smallest set of nodes that would ever be irreducible entries?_
> This is tricky. `build_loop_tree` finds loop-heads, which would certainly all have to be marked. However, finding all secondary entries is something one would have to do. Currently, when we find a second entry, we stop there, but to determine all secondary entries one would probably have to traverse further into the loop again. One might be able to mark less nodes, in some cases where we have reducible loops inside irreducible loops, where the loop-head of the (inner) reducible loop is not an entry of the (outer) irreducible loop. It is not yet clear to me how to do that, and if it really speeds things up enough to justify the added complexity.
>
> **Testing**
>
> This bug was reported with a modified classfile, as far as we know the bytecode must have been modified/fuzzed. I then wrote my own bytecode fuzzer that produces JASM code with irreducible loops, and quickly found all sorts of similar failures. I am already working on porting this fuzzer to Java [JDK-8299214](https://bugs.openjdk.org/browse/JDK-8299214), and hopefully integrate it into testing. It did not just find issues with irreducible loops, but also with infinite loops.
>
> I added many tests, some reduced down from that fuzzer, some hand-crafted.
>
> So far, this change passes stress-testing and tier1-tier7. Rerunning it again now...
> Performance testing shows no significant runtime change.
>
> **Update**: I decided that we can always run `verify_regions_in_irreducible_loops`. The overhead is very minimal. In most cases we do not even have irreducible loops, and so there is no overhead. In the appended regression test where there are a lot of irreducible loops, we have less than one percent overhead on the compilation time.
Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
Updated copyright to 2023
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/11764/files
- new: https://git.openjdk.org/jdk/pull/11764/files/3cea2e5b..d7d9c21b
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=11764&range=08
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=11764&range=07-08
Stats: 6 lines in 6 files changed: 0 ins; 0 del; 6 mod
Patch: https://git.openjdk.org/jdk/pull/11764.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/11764/head:pull/11764
PR: https://git.openjdk.org/jdk/pull/11764
More information about the hotspot-compiler-dev
mailing list