RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v13]

Mon Feb 10 15:56:14 UTC 2025

On Fri, 7 Feb 2025 14:36:01 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:

>> Thanks for having yet another look at this!
>> 
>>> If I comment out the line it solves all the failures we have seen. I double-checked that we then perform the exact 
>> MergeMem/Phi swap idealizations discussed above.
>> 
>> That sounds promising! Looks like this temporary restriction became quite permanent - it's from initial load. I'm wondering if that is still necessary and if so if we have tests to catch that (we would probably hit the "infinite loop in IGVN" in that case).
>> 
>>> I am wondering what the proper solution is here. I will, of course, investigate if it is possible to loosen the restriction and still ensure termination. On the other hand, it also seems strange that the anti-dependence search is so sensitive to missing idealizations?
>> 
>> That would be great if we can get around this termination issue somehow - if it's still a problem. I think that is very unfortunate that we might be relying on this Ideal transformation to be applied to ensure correctness later on. If it's really required, we should at least make sure to add some verification code to catch this in debug builds. You could, for example, just turn what you have now into verification code, i.e. check that we cannot find another anti dependency edge with another search root. And/Or re-apply this particular transformation for each Phi node again in the end to see if we missed some swaps.
>
> I have now investigated the `PhiNode::Ideal` restriction above. In summary, I have not found any simple change that resolves the present issue *and* does not introduce problems elsewhere.
> 
> Here is what I have tried. Both changes solve the present issue.
> 1. Remove the restriction entirely. As the source code comment suggests, this results in non-termination (which is very easy to verify). The reason is that memory Phis are (naturally) often circular, and there are plenty of cases where we push MergeMems indefinitely across circular Phis.
> 2. Only apply the idealization if we can guarantee that we are not pushing MergeMems over Phis in a circular manner. I check this through a complete upwards walk of the memory graph from the current Phi to ensure we cannot reach it from itself. This is likely quite expensive and we can probably do something more clever. It kind of works, but there are still tests that fail. Even if we now do terminate, I suspect we still have a combinatorial explosion of new split Phi nodes in certain cases, because we hit the MemLimit in many of the failing tests.
> 
> I can and probably will continue to investigate option 2, but it feels like that should be a separate RFE. I'm open to suggestions.
> 
>> You could, for example, just turn what you have now into verification code, i.e. check that we cannot find another anti dependency edge with another search root.
> 
> @chhagedorn Yes, I agree. If we want to enforce a memory graph invariant at the time of `insert_anti_dependences`, we should also assert that it holds as best we can.

> @dlunde do we know at which point in the compilation chain the disjoint memory state invariant (that the above idealization restores) is broken? Would it be possible to do some analysis at that point to "simply" avoid producing the problematic memory subgraph in the first place?

I've looked at this before, and have now revisited it again. It *does* look like the problematic memory subgraph results due to loop peeling, even though I discarded this hypothesis before together with @chhagedorn. @chhagedorn In our investigation where you helped me manually turn off loop peeling (`TestNoPeeling.java` attached to the issue), we only turned off full loop peeling. We actually still run partial peeling in that example. If I add `-XX:-PartialPeelLoop` to the set of flags (in addition to your patch turning of standard loop peeling), the issue disappears.

Below is a small example demonstrating what happens during loop peeling. Left is before partial peeling and right after. Before peeling, there is a path from `7 Parm` (`initial_mem`) to `183 Phi`. This ensures we properly raise the LCA and add anti-dependence edges. After peeling, we have cloned the loop body (in `PhaseIdealLoop::clone_loop`, resulting in the clone `325 MergeMem` of `231 MergeMem`) and then merged the clones with a new `339 Phi`. The path from `7 Parm` to `183 Phi` is now blocked and we fail to raise the LCA and add anti-dependence edges.
![Screenshot from 2025-02-10 16-29-15](https://github.com/user-attachments/assets/9551c43c-c72a-4bc5-989a-918d61e2e335)

> I wonder if the core invariant we want to assert here is that two memory states with aliasing slices never overlap in time, after GCM and LCM are done. This could be checked by performing liveness analysis of the memory subgraph after GCM and LCM. This may sound expensive to compute but it could turn out to be acceptable in practice (for debug builds). The similarly expensive register-level liveness analysis in PhaseOutput::perform_mach_node_analysis takes no more than 1-2% of the entire C2 execution time, on average.

Sounds like a great idea, but I think we need to discuss the details further first. It is not quite clear to me yet what it is we want to assert.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1949383171