[jdk19] RFR: 8284358: Unreachable loop is not removed from C2 IR, leading to a broken graph

Tobias Hartmann thartmann at openjdk.org
Thu Jun 30 05:07:27 UTC 2022


On Wed, 29 Jun 2022 18:50:29 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Similar to https://github.com/openjdk/jdk/pull/425 and https://github.com/openjdk/jdk/pull/649, entry control to a loop `RegionNode` dies right after parsing (during first IGVN) but the dead loop is not detected/removed. This dead loop then keeps a subgraph alive, which leads to two different failures in later optimization phases that are described below.
>> 
>> I assumed that such dead loops should always be detected, but to avoid a full reachability analysis (graph walk to root), C2 only detects and removes "unsafe" dead loops, i.e., dead loops that might cause issues for later optimization phases and should therefore be aggressively removed. See `RegionNode::Ideal` -> `RegionNode::is_unreachable_region` -> `RegionNode::is_possible_unsafe_loop`:
>> 
>> https://github.com/openjdk/jdk19/blob/dbc6e110100aa6aaa8493158312030b84152b33a/src/hotspot/share/opto/cfgnode.cpp#L541-L549 
>> 
>> https://github.com/openjdk/jdk19/blob/dbc6e110100aa6aaa8493158312030b84152b33a/src/hotspot/share/opto/cfgnode.cpp#L327-L331
>> 
>> Here is a detailed description of the two failures and the corresponding fixes:
>> 
>> 1) `No reachable node should have no use` assert at the end of optimizations (introduced by [JDK-8263577](https://bugs.openjdk.org/browse/JDK-8263577)):
>> 
>> At the beginning of CCP, the types of all nodes are initialized to `top`. Since the following subgraph is not reachable from root due to a dead loop above in the CFG, the types of all unreachable nodes remain top:
>> ![1_BeforeCCP](https://user-images.githubusercontent.com/5312595/176446327-e6fdee4d-49ea-4406-9b15-b29366cd9f55.png)
>> 
>> The `Rethrow`, `Phis` and `Region` are removed during IGVN because they are `top` but the `292 CatchProj` remains:
>> 
>> ![3_BarrierExpand](https://user-images.githubusercontent.com/5312595/176446385-0374b6ba-7c0b-447d-90f9-c73e3aee4918.png)
>> 
>> We then hit the assert because the `CatchProj` has no user. Similar to how https://github.com/openjdk/jdk/pull/3012 was fixed, we need to make sure that when `RegionNode` inputs are cut off because their types are `top`, they are added to the IGVN worklist (see change in `cfgnode.cpp:504`). With that, the entire dead subgraph is removed.
>> 
>> 2) `Unknown node on this path` assert while walking the memory graph during scalar replacement:
>> 
>> After parsing, the `167 Region` that belongs to a loop loses entry control (marked in red):
>> ![2_Diff_Parsing_IGVN](https://user-images.githubusercontent.com/5312595/176453465-95f48c16-6cb7-4373-baa8-edf5e4fbcde2.png)
>> 
>> The dead loop is not detected/removed because it's not considered "unsafe" since the Phis of the dying Region only have a Call user which is considered safe:
>> 
>> https://github.com/openjdk/jdk19/blob/dbc6e110100aa6aaa8493158312030b84152b33a/src/hotspot/share/opto/cfgnode.cpp#L352-L355
>> 
>> ![DyingRegion](https://user-images.githubusercontent.com/5312595/176469880-f81a7d7e-b769-444a-bf5b-14f8cca1f9af.png)
>> 
>> The same can happen with other CFG users (for example, MemBars or Allocates). These scenarios are also covered by the regression test. Later during IGVN, `309 Region` which is part of the now dead subgraph is processed and found to be potentially "unsafe" and unreachable from root:
>> 
>> ![1_AfterParsing](https://user-images.githubusercontent.com/5312595/176453110-8a4a587f-f1ef-45bf-8a68-e476f142aa7e.png)
>> 
>> It's then removed together with its Phi users, leaving `505 MergeMem` with a top memory input:
>> 
>> ![3_MacroExpansion](https://user-images.githubusercontent.com/5312595/176461343-ab446fe0-04a8-48a5-95c2-c8ead6c872cf.png)
>> 
>> We then hit the assert when encountering a top memory input while walking the memory graph during scalar replacement.
>> 
>> The root cause of the failure is an only partially removed dead subgraph. A similar issue has been fixed long ago by [JDK-8075922](https://bugs.openjdk.org/browse/JDK-8075922), but the fix is incomplete. I propose to aggressively remove such dead subgraphs by walking up the CFG when detecting an unreachable Region belonging to an "unsafe" loop and replacing all nodes by `top`. 
>> 
>> Special thanks to Christian Hagedorn for helping me with finding a regression test.
>> 
>> Thanks,
>> Tobias
>
> src/hotspot/share/opto/cfgnode.cpp line 504:
> 
>> 502:       }
>> 503:       if( phase->type(n) == Type::TOP ) {
>> 504:         set_req_X(i, NULL, phase); // Ignore TOP inputs
> 
> This is not guarded by `can_reshape` (call from IGVN). It is not correct to use set_req_X() during parsing.

Since [JDK-8263577](https://bugs.openjdk.org/browse/JDK-8263577), there are two versions of `Node::set_req_X`. The new one takes a `PhaseGVN` and falls back to a regular `set_req`:
https://github.com/openjdk/jdk19/blob/dbc6e110100aa6aaa8493158312030b84152b33a/src/hotspot/share/opto/phaseX.cpp#L2159-L2165

-------------

PR: https://git.openjdk.org/jdk19/pull/92


More information about the hotspot-compiler-dev mailing list