RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal

Mon Apr 7 11:59:52 UTC 2025

On Mon, 31 Mar 2025 12:25:48 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:

> After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes.
> 
> ### Changeset
> 
> Changes:
> - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case.
> - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN.
> - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489)
> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
> - Performance testing
>   - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions.
>   - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions.
> 
> ### Additional issue investigation
> 
> For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in itself. In the below, I consider and refer ...

src/hotspot/share/opto/cfgnode.cpp line 2580:

> 2578:         // IGVN iteration. We have put the Phi nodes on the IGVN worklist, so
> 2579:         // they are transformed later on in any case.
> 2580:         hook->destruct(igvn);

Do you need this `hook` node to preserve `new_base` now that you are not calling `PhaseGVN::transform` anymore?

src/hotspot/share/opto/phaseX.cpp line 1058:

> 1056:       // Ensure we did not increase the live node count with more than
> 1057:       // max_live_nodes_increase_per_iteration during the call to transform_old
> 1058:       DEBUG_ONLY(int increase = live_nodes_after - live_nodes_before;)

For consistency with the surrounding code, maybe you could define these as `NOT_PRODUCT`, and possibly group them under `#ifndef PRODUCT` blocks?

test/hotspot/jtreg/compiler/itergvn/TestSplitPhiThroughMergeMem.java line 67:

> 65:             new String("abcdef" + param2);
> 66:             new String("ghijklmn" + param1);
> 67:             new String("ghijklmn" + param1);

This test illustrates an interesting behavior: C2 generates around 12 Kb of code for this rather infrequent code path (and the frequency can be further reduced without affecting C2's outcome). This seems to contradict C2's general philosophy of focusing the compilation effort (and code cache usage) on hot code. It would be interesting to investigate whether there is an opportunity to make some heuristic more execution-frequency aware here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2031060193
PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2031064494
PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2031084663