RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v10]

Mon Jan 27 11:35:51 UTC 2025

On Mon, 27 Jan 2025 10:01:40 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> OK, I see, thanks for clarifying @merykitty. Then, to summarize, I believe that there should be an invariant for the memory graph at the point of `insert_anti_dependences` that for each load there can only be one valid memory input. If we would have such an invariant, the current algorithm is correct and finds all relevant anti-dependences.
>> 
>> The changeset I now propose is a local patch that deals with miscompilations that we now observe (and the patch is easy to backport). Hopefully, we can find some way to enforce the invariant above in a follow-up RFE. Let me know what you think.
>
> @dlunde It would be great if you can write a formal proof regarding the points I made for the explanation in the code :)

@merykitty I guess what you mean is a formal soundness proof for the overall approach in `insert_anti_dependences`? While I do agree with you that this would be great, it sounds out of scope for this patch. What we've done here is observe that there are cases where we currently fail, and patch the code to fix those cases (as we should make sure we fix miscompilations promptly).

I am trying to understand your earlier proposed explanation, but can't say I have succeeded yet. Some comments:

> In a basic block `b` ...

It seems here you assume a total order of memory nodes within blocks? Note that we have not run LCM at this point, so I don't see how we can reason in (exactly) this manner. We can get the required information from the memory graph though.

> If n is undefined at the end of any of the predecessors of b, then n is undefined at the beginning of b.
> Otherwise, if n is killed at the end of any of the predecessors of b, then n is killed at the beginning of b.

Yes, this is why we stop the anti-dependence search at Phis and (conservatively) raise the LCA above the input from which we reached the Phi. Why do you require the distinction between undefined and killed?

> There are 2 kinds of kill, 1-kill and 2-kill (can think of as whole and part).

When do we 2-kill a node, but not 1-kill it? It seems like there is a missing case in your list here?

> we need to go up the memory graph through all merges

I am not convinced we need to go up through merges. We should only need to go downwards from the load's memory input. This breaks currently because there are many valid memory inputs to the load.

> which means we do not go horizontally, going from a Phi input to another Phi input

I don't quite understand what you mean here.

> Edit: We need to go down when iterating a memory node from below, too.

I'm lost here as well.

> Going down through a MergeMem.
> Going down through a Phi: This Phi must be dominated by n, or else n is undefined and we can skip it. Since this Phi must be dominated by n, it can either:
> Lies between the block of n or early, which we covers in our sweep
> Lies after early, then we just schedule the load above the Phi, skipping any anti-dependence search since they will be satisfied by this operation

The proposed changeset adds relevant Phis between `initial_mem` (`n`?) and `early` as search roots, which means we should not need to go down through any Phis.

> Going up through a Phi: Do we cover this?

No, and I argue we should not need to (see above).

> Going up through a MergeMem: I assume there should be no kill of the inputs of a MergeMem so we do not need to care about this?

We do currently go up through MergeMems at the start when defining `initial_mem`. This was a recent patch by @veresov.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22852#issuecomment-2615513488