RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v5]

Mon Jan 20 12:10:46 UTC 2025

On Tue, 7 Jan 2025 18:00:21 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:

>> Good analysis, Daniel! Given the presence of overlapping memory phis (memory phis that are placed in the same block and include aliasing memory slices), the general idea of this fix seems reasonable to me. As a more fundamental solution, it would be worth investigating (perhaps separately) the root cause of this overlap and exploring whether it is feasible to enforce disjointness (an invariant apparently assumed by the original `PhaseCFG::insert_anti_dependences` algorithm), at least during code generation.
>> 
>> Does the comment above the definition of `initial_mem` require any update as part of this change?
>
> Thanks for the review @robcasloz!
> 
>> Given the presence of overlapping memory phis (memory phis that are placed in the same block and include aliasing memory slices), the general idea of this fix seems reasonable to me. As a more fundamental solution, it would be worth investigating (perhaps separately) the root cause of this overlap and exploring whether it is feasible to enforce disjointness (an invariant apparently assumed by the original PhaseCFG::insert_anti_dependences algorithm), at least during code generation.
> 
> Yes, I think it could be useful to investigate further. Based on my observations while working on this issue, the overlapping memory Phis likely result from loop peeling. However, the Phi overlap is not the sole cause of this issue, as the second example demonstrates. I suggest we write an RFE and investigate in a separate issue.
> 
>> Does the comment above the definition of initial_mem require any update as part of this change?
> 
> Yes, thanks. Added now.

> @dlunde Thanks for the updates. I still struggle to understand the algorithm completely. Christian and I looked at it today for 1.5h. I think it may be best if we schedule a call to discuss this tomorrow.

Thanks @eme64 and @chhagedorn for the time investment. Sounds good to discuss it tomorrow. I'll DM you to decide on a time slot.

> We also thought that the documentation of the algorithm is a big difficult to understand (before your changes), and it would be nice if we could improve that now that we are digging into it anyway.

I agree, the overall documentation of `insert_anti_dependences` is lacking.

> We are also not sure if the graph is sane before insert_anti_dependences.

Right, if the input graph is sane is a very valid question and for sure requires further investigation (I was thinking as a follow-up RFE, but let's discuss this).

> It would be nice if you could show us a graph that includes the CFG graph and the memory graph, so that it is a little easier to understand what happens. It is interesting that in Example 1, it seems that both the Load and membar_release are in the same early block.

Yes, I have quite a few examples that we can have a look at.

> We also wondered why the membar_release for the volatile byte ends up on the same slice as the int field. It's well possible that this is correct, but just less than optimal.

The `membar_release` has bot memory as input, so we have to raise the load above it. But there is perhaps some way to isolate the slices before we call `insert_anti_dependences`, to make the anti-dependence search less conservative.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22852#issuecomment-2602253139