RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v2]

Mon Oct 6 13:10:47 UTC 2025

On Wed, 1 Oct 2025 14:44:31 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job.
>> 
>> When we have something like
>> 
>> // any loop
>> while (...) { /* something involving limit */ }
>> // counted loop with zero trip guard
>> if (i < limit) {
>>     for (int i = init; i < limit; i++) { ... }
>> }
>> 
>> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`.
>> 
>> But the method `PhaseIdealLoop::do_unroll` has the assert
>> 
>> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930
>> 
>> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop.
>> 
>> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling.
>> 
>> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do?
>> 
>> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all.
>> 
>> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling.
>>...
>
> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add test

Thanks for the summary and listing the options! I had a feeling that this fix is not enough and we can also trigger it without Loop Peeling. I had a go and indeed found a case without stress peeling:

[Test.java](https://github.com/user-attachments/files/22722318/Test.java)

This, unfortunately, makes this fix not sufficient and we need to find a better solution. But what is there left to do?
- As you pointed out, running IGVN just for this one assert seems overkill. Also applying some local pre-IGVN phi clean-up hacks seems like duplicated effort and possibly error-prone.
- Removing the asserts since we seem to do the right thing anyway: Doable but then we have no protection anymore when we later introduce a bug (or already have an existing lurking bug) where we mess this property up. We hit this assert in the past due to bugs ([link](https://bugs.openjdk.org/issues/?jql=text%20~%20%22%5C%22opaq-%3Eoutcnt()%20%3D%3D%201%5C%22%22)). So, I would be less inclined to remove the asserts.
- You could have a go to somehow prove that the equality is just hidden by some useless phis but as you mentioned already, it might not be so straight forward and difficult to get right.
- Could we somehow bail out of loop unrolling or just not apply it at all when we have this this inequality and wait until after IGVN? We would probably then still need some mechanism to re-check that after IGVN, we now have the same node and not just endlessly bail out again without noticing a real problem.
- ...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3371554178