RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000)

Wed May 7 09:03:15 UTC 2025

On Wed, 7 May 2025 06:57:29 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

>> Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic.
>> 
>> ### Changeset
>> 
>> Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below).
>> 
>> The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations).
>> 
>> I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit).
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset.
>
> Looks good to me. Just a quick (curiosity) question: why did you choose to multiply by 3 and not, for instance, doubling the current max amount (i.e. x 4)?

@dafedafe
> Looks good to me. Just a quick (curiosity) question: why did you choose to multiply by 3 and not, for instance, doubling the current max amount (i.e. x 4)?

Thanks for the review! We'd like to keep this bound as small as possible so that we do not get a too conservative IGVN node count bailout. But, I guess using 4 instead of 3 doesn't really matter in practice and perhaps it looks cleaner to use a power of 2.

@robcasloz @eme64
> If we cannot bound the amount of nodes that can be created by PhiNode::Ideal, wouldn't it be more robust to simply disable the single-iteration node increase assertion for PhiNode? Otherwise there is the risk that we encounter the failure again with a slightly larger test case. Alternatively, if we could (?) derive a tighter bound for PhiNode (e.g. based on its number of inputs, number of memory slices for memory phis, etc.) we could try to compute it and use it in the assertion.

> @robcasloz @dlunde Yes, such an exception may help us keep tight bounds on most nodes. And maybe we can even quantify more precisely how many nodes we expect to be created by PhiNode::Ideal. Maybe it is somehow linear in its inputs?

The issue with weakening the per-iteration assertion in special cases is that we _must_ ensure that we do not grow by more than `max_live_nodes_increase_per_iteration` in a single iteration. Below is my failure analysis for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) which describes the issue.

> After the changes for JDK-8333393, we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in PhaseIterGVN::optimize. In particular, when we are close to the MaxNodeLimit (80 000 by default), it can happen that we go from below MaxNodeLimit - NodeLimitFudgeFactor * 2 (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the PhaseIterGVN::optimize loop does not trigger as expected and we instead crash at an assert in node creation as we surpass MaxNodeLimit nodes.

I guess we could just remove [the assert during node creation](https://github.com/openjdk/jdk/blob/50554fa1982f042fb1d7b6c8a16334b97b31bb63/src/hotspot/share/opto/node.cpp#L78) as an alternative solution, or disable it during IGVN (similarly to how it is currently disabled during code generation).

> @dlunde it would also be interesting to look more deeply into PhiNode::Ideal, and see what happens there. The Phi has 150+ inputs, but how does that generate 4k+ nodes? That would be 4000/150 ~ 25+ nodes per input. I'm just wondering if this is really sane? And is it profitable? Might it be better to check if we are creating that many nodes before doing it, and blowing through the node budget? It might be worth investigating. But I do hear that it is difficult to reproduce.

I agree that we should investigate further, but suggest to do this as a separate RFE (to not continue polluting testing pipelines with the assert). My semi-educated guess is that the new nodes are added as part of the call to `MemNode::optimize_memory_chain` in `PhiNode::Ideal`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2857755141