RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2]

Emanuel Peter epeter at openjdk.org
Fri Jan 17 08:21:42 UTC 2025


On Thu, 9 Jan 2025 16:49:36 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:

> I am not fully reassured we cover all cases either, but we at least now cover more cases than before (and specifically the cases that appear for this issue). See also the earlier comment by Roberto on investigating whether we can somehow enfore in earlier phases the invariant that the original version of PhaseCFG::insert_anti_dependences assumed: that we only need to search from one initial memory state. I would suggest doing this in a separate RFE.

Hmm. I guess we could allow your "ad hoc" looking fix. Still, I think you could improve the comment, so that if someone else hits a bug here they have an easier starting point.

Maybe you could say something like this:
We only add anti-dependency edges for `load -> store` and not for `load -> phi`. But below a `phi` there could be more `stores`, which must happen after the `load`.

Then say something more about how you find all the relevant `load`s with your forest traversal.

Another question: why not just traverse down through `phi` nodes, just like with the `MergeMem`s? That way, we could also find all `stores` that have to happen after a `load`, right? It seems to me that would be easier to make a proof for.

Actually, does your current algorithm with traversing all blocks on the idom change find all relevant `loads`? Maybe you can actually make a good argument here too. Maybe you can argue in which blocks the `load` could be scheduled (all from `early` to the initial_mem_block ?), and so we must ensure the anti-dependency edges for any `store` in those blocks that could alias. Am I getting this right?

If that is true, you could alternatively just search down through `MergeMem` and `Phi`, collecting all `stores` at the "leaves". But constrain the search to the blocks between `early` and `initial_mem_block`.

I'm not saying that you need to change your algorithm, I'm just trying to understand at this point. Maybe your solution is correct, and we can convince ourselves that it actually covers all cases if we come up with a good proof / argument.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1919713707


More information about the hotspot-compiler-dev mailing list