RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000)

Wed May 7 07:41:19 UTC 2025

On Tue, 6 May 2025 15:34:51 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic.
>> 
>> ### Changeset
>> 
>> Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below).
>> 
>> The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations).
>> 
>> I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit).
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset.
>
> Looks resonable to me.

> @chhagedorn Thanks for the review!
> 
> @eme64
> 
> > @dlunde Did I understand this right: a single node was transformed, and it created over 4k new nodes?
> 
> Yes, a single call to `transform_old` resulted in more than 4k new nodes.
> 
> > ```
> >       DEBUG_ONLY(int live_nodes_before = C->live_nodes();)
> >       Node* nn = transform_old(n);
> >       DEBUG_ONLY(int live_nodes_after = C->live_nodes();)
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > Do you know which node was transformed, and what exactly happens there?
> 
> I investigated, but did not manage to reproduce the failure locally (so I could not look at it in detail). No success with reproducing through replay files either. In Oracle-internal testing, the failure reproduces in only 1% of the test runs. I did do a simple dump of the nodes `n` and `nn` during an iteration that triggered the assert, and got the below.
> 
> ```
> 19517  Phi  === 1444 17794 19566 19568 19570 14402 19571 19573 19575 19576 19580 19584 19585 19591 19597 19598 19606 19617 19720 19704 19826 19703 19837 19618 19851 19701 19956 19700 19970 19698 19987 19697 20004 19695 20024 19694 20044 19692 20067 19691 20090 19689 20116 19688 20142 19686 20171 19685 20200 19683 20232 19682 20264 20301 19620 17860 20405 19680 20441 20480 19621 19678 20583 19677 20622 20666 20768 17855 20869 19675 20912 19623 20958 19673 21058 19672 21104 19624 17850 17850 21155 19670 21206 19668 21260 19667 21314 17845 21373 19665 21431 19663 21492 19662 21553 21615 19660 21678 19659 21742 21807 19657 21873 19656 21940 22008 19654 22077 19653 22147 22218 19651 22290 19650 22363 22437 19648 22512 19647 22588 22665 19645 22743 19644 22822 22902 19642 22983 19641 23065 23148 19639 23232 19638 23317 23403 19636 23490 19635 23578 23667 19633 23757 19632 23848 23940 19630 24033 19629 24127 24222 19627 24318 19626 24415 24513  [[ 19529 ]]  #memory  Memory: @java/lang/Lon
 g (java/io/Serializable,java/lang/Comparable,java/lang/constant/Constable,java/lang/constant/ConstantDesc):NotNull:exact+16 *,iid=4638, name=value, idx=32; !orig=19506,[2618],[39511],[764] !jvms: VarHandleTestByteArrayAsLong::testArrayReadWrite @ bci:83 (line 1059)
> 
> 19517  Phi  === 1444 17794 19566 19568 19570 14402 19571 19573 19575 19576 19580 19584 19585 19591 19597 19598 19606 19617 19720 19704 19826 19703 19837 19618 19851 19701 19956 19700 19970 19698 19987 19697 20004 19695 20024 19694 20044 19692 20067 19691 20090 19689 20116 19688 20142 19686 20171 19685 20200 19683 20232 19682 20264 20301 19620 17860 20405 19680 20441 20480 19621 19678 20583 19677 20622 20666 20768 17855 20869 19675 20912 19623 20958 19673 21058 19672 21104 19624 17850 17850 21155 19670 21206 19668 21260 19667 21314 17845 21373 19665 21431 19663 21492 19662 21553 21615 19660 21678 19659 21742 21807 19657 21873 19656 21940 22008 19654 22077 19653 22147 22218 19651 22290 19650 22363 22437 19648 22512 19647 22588 22665 19645 22743 19644 22822 22902 19642 22983 19641 23065 23148 19639 23232 19638 23317 23403 19636 23490 19635 23578 23667 19633 23757 19632 23848 23940 19630 24033 19629 24127 24222 19627 24318 19626 24415 24513  [[ 19529 ]]  #memory  Memory: @java/lang/Lon
 g (java/io/Serializable,java/lang/Comparable,java/lang/constant/Constable,java/lang/constant/ConstantDesc):NotNull:exact+16 *,iid=4638, name=value, idx=32; !orig=19506,[2618],[39511],[764] !jvms: VarHandleTestByteArrayAsLong::testArrayReadWrite @ bci:83 (line 1059)
> ```
> 
> That is, a (locally unchanged) large Phi node. I would assume `PhiNode::Ideal` added 4k new nodes somewhere further up the inputs.

If we cannot bound the amount of nodes that can be created by `PhiNode::Ideal`, wouldn't it be more robust to simply disable the single-iteration node increase assertion for `PhiNode`? Otherwise there is the risk that we encounter the failure again with a slightly larger test case. Alternatively, if we could (?) derive a tighter bound for `PhiNode` (e.g. based on its number of inputs, number of memory slices for memory phis, etc.) we could try to compute it and use it in the assertion.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2857442781