RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v5]

Tue Jan 21 00:51:37 UTC 2025

On Mon, 20 Jan 2025 10:41:44 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Thanks for the review @robcasloz!
>> 
>>> Given the presence of overlapping memory phis (memory phis that are placed in the same block and include aliasing memory slices), the general idea of this fix seems reasonable to me. As a more fundamental solution, it would be worth investigating (perhaps separately) the root cause of this overlap and exploring whether it is feasible to enforce disjointness (an invariant apparently assumed by the original PhaseCFG::insert_anti_dependences algorithm), at least during code generation.
>> 
>> Yes, I think it could be useful to investigate further. Based on my observations while working on this issue, the overlapping memory Phis likely result from loop peeling. However, the Phi overlap is not the sole cause of this issue, as the second example demonstrates. I suggest we write an RFE and investigate in a separate issue.
>> 
>>> Does the comment above the definition of initial_mem require any update as part of this change?
>> 
>> Yes, thanks. Added now.
>
> @dlunde Thanks for the updates. I still struggle to understand the algorithm completely. Christian and I looked at it today for 1.5h. I think it may be best if we schedule a call to discuss this tomorrow.
> 
> We also thought that the documentation of the algorithm is a big difficult to understand (before your changes), and it would be nice if we could improve that now that we are digging into it anyway.
> 
> We are also not sure if the graph is sane before `insert_anti_dependences`. It would be nice if you could show us a graph that includes the CFG graph and the memory graph, so that it is a little easier to understand what happens. It is interesting that in Example 1, it seems that both the `Load` and `membar_release` are in the same `early` block.
> 
> We also wondered why the `membar_release` for the volatile byte ends up on the same slice as the int field. It's well possible that this is correct, but just less than optimal.

@eme64 `membar_release` aliases everything because it is a release barrier, it must prevent earlier loads from moving across it.

@dlunde IIUC the issue is that an anti-dependency search stops at `PhiNode`s while it must look through it because the memory is not dead at that `Phi` but it is killed later. The fact that the memory is alive at the `Phi` means that we do not make the load anti-dependent on the `Phi` but since we stop searching we fail to make the load anti-dependent on the memory kill later.

The solution you proposed here is to look at more `Phi`s. This resumes the search, albeit in a seemingly accidental manner, helps explore the graph better, and reduces the probability of us missing an anti-dependent memory kill. However, I cannot see the relation between the `Phi`s we need to search and the scope you are searching. What does the early block have anything to do with the memory graph here? The early block takes into consideration the control input and the pointer input, too, which seems unrelated to the memory graph.

For example, in your example 2, if I just arbitrarily remove the control input of `100 loadI`, then how can your solution manage to find `76 Phi` to do anti-dependence search on? I feel that in such cases the load is still anti-dependent on `58 membar_release`, right?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22852#issuecomment-2603418572