RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7]

Thu Apr 24 08:35:54 UTC 2025

On Wed, 23 Apr 2025 17:01:33 GMT, Dhamoder Nalla <dhanalla at openjdk.org> wrote:

>>> @dhanalla I see that you have had a conversation with @chhagedorn here, where you explained more details about what exactly goes wrong. Can you please update the PR description with these details? Generally, that makes it much easier to review, then the reviewers don't need to read through the whole conversation and figure out what is now stale (things you already applied) and what is still an active conversation. While you are at it, you can also update the description on JIRA.
>> 
>> You're right that in the current implementation, we begin the scalarization process and only bail out once the live node count has already exceeded the limit. At that point, the graph is indeed partially transformed, which is why we fall back to recompilation without EA to ensure a safe and consistent compilation state.
>> Accurately predicting the number of nodes before transformation is difficult due to the variety of types and structures involved — each element can lead to multiple nodes (e.g., phi nodes, loads/stores, etc.), and the graph can grow non-linearly depending on how the array is used.
>> However, I agree that giving up entirely on EA just because of one large array seems like an overly conservative fallback, especially if the rest of the method would still benefit from EA.
>
>> > @dhanalla I see that you have had a conversation with @chhagedorn here, where you explained more details about what exactly goes wrong. Can you please update the PR description with these details? Generally, that makes it much easier to review, then the reviewers don't need to read through the whole conversation and figure out what is now stale (things you already applied) and what is still an active conversation. While you are at it, you can also update the description on JIRA.
>> 
>> You're right that in the current implementation, we begin the scalarization process and only bail out once the live node count has already exceeded the limit. At that point, the graph is indeed partially transformed, which is why we fall back to recompilation without EA to ensure a safe and consistent compilation state. Accurately predicting the number of nodes before transformation is difficult due to the variety of types and structures involved — each element can lead to multiple nodes (e.g., phi nodes, loads/stores, etc.), and the graph can grow non-linearly depending on how the array is used. However, I agree that giving up entirely on EA just because of one large array seems like an overly conservative fallback, especially if the rest of the method would still benefit from EA.
> 
> @eme64 If this answers your question, this PR is ready for review

@dhanalla I see. @chhagedorn and I quickly looked through the code, and it seems there are other bailouts that use the FudgeFactor.

It also seems that you need an unreasonably high `EliminateAllocationArraySizeLimit`, and so this failure should never actually happen normally, right? Or is it possible to reproduce the same bug with a lower `EliminateAllocationArraySizeLimit` but just more allocations? If so, it would be good if you added such test cases.

Is it possible to exceed the node limit with the default `EliminateAllocationArraySizeLimit`, i.e. so that we would hit the assert before your changes, and bailout after your changes?

I have two worries, and maybe @vnkozlov can say something here:
- By the time we check the condition and bail out, we may have allocated a lot of nodes, and possibly be far over the node limit. That means we already used a lot of memory and time. How bad can this get?
- And as discussed above: we could have done EA partially, until getting close to the node limit, and then not do allocation elimination on the remaining allocations. That would be a partial benefit, which we do not have if we recompile without EA.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2826798704