RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2]
Daniel Lundén
dlunden at openjdk.org
Fri Jan 17 20:02:44 UTC 2025
On Fri, 17 Jan 2025 08:11:41 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>>> I am not fully reassured we cover all cases either, but we at least now cover more cases than before (and specifically the cases that appear for this issue). See also the earlier comment by Roberto on investigating whether we can somehow enfore in earlier phases the invariant that the original version of PhaseCFG::insert_anti_dependences assumed: that we only need to search from one initial memory state. I would suggest doing this in a separate RFE.
>>
>> Hmm. I guess we could allow your "ad hoc" looking fix. Still, I think you could improve the comment, so that if someone else hits a bug here they have an easier starting point.
>>
>> Maybe you could say something like this:
>> We only add anti-dependency edges for `load -> store` and not for `load -> phi`. But below a `phi` there could be more `stores`, which must happen after the `load`.
>>
>> Then say something more about how you find all the relevant `load`s with your forest traversal.
>>
>> Another question: why not just traverse down through `phi` nodes, just like with the `MergeMem`s? That way, we could also find all `stores` that have to happen after a `load`, right? It seems to me that would be easier to make a proof for.
>>
>> Actually, does your current algorithm with traversing all blocks on the idom change find all relevant `loads`? Maybe you can actually make a good argument here too. Maybe you can argue in which blocks the `load` could be scheduled (all from `early` to the initial_mem_block ?), and so we must ensure the anti-dependency edges for any `store` in those blocks that could alias. Am I getting this right?
>>
>> If that is true, you could alternatively just search down through `MergeMem` and `Phi`, collecting all `stores` at the "leaves". But constrain the search to the blocks between `early` and `initial_mem_block`.
>>
>> I'm not saying that you need to change your algorithm, I'm just trying to understand at this point. Maybe your solution is correct, and we can convince ourselves that it actually covers all cases if we come up with a good proof / argument.
>
> About naming: I don't care which exact names you use, it would just be nice to have some clear definition of what means what at the beginning, so the reader does not have to reverse engineer it ;)
> Hmm. I guess we could allow your "ad hoc" looking fix. Still, I think you could improve the comment, so that if someone else hits a bug here they have an easier starting point.
I have made an attempt to clarify the approach in the comments (also now with a small example).
> Maybe you could say something like this:
We only add anti-dependency edges for load -> store and not for load -> phi. But below a phi there could be more stores, which must happen after the load.
I see what you mean, but this is not quite the problem. `load -> phi` will already, in the vast majority of cases, ensure that the load is scheduled before the relevant Phi inputs (and, transitively, before any anti-dependent stores after the Phi). The problem occurs when "before the relevant Phi inputs" are blocks even earlier than the early block. In this case, the Phis have no effect and we need to consider other search roots to find all relevant anti-dependent stores.
> Then say something more about how you find all the relevant loads with your forest traversal.
Right, check the revised comments and example and see if it makes more sense now.
> Another question: why not just traverse down through phi nodes, just like with the MergeMems? That way, we could also find all stores that have to happen after a load, right? It seems to me that would be easier to make a proof for.
That is an interesting observation! Please see the comment about the [alternative changeset](https://github.com/openjdk/jdk/compare/master...dlunde:jdk:insert-anti-dependences-8333393+rewrite) I developed in the PR description, which does more or less what you describe. There are many pitfalls when traversing down through Phis. Specifically, we must ensure that initial_mem is live (not overwritten) on all Phi inputs before we search past the Phi. This Phi property is not easy to discover and hence the alternative fix is more invasive and introduces a noticable compilation speed slowdown.
> Actually, does your current algorithm with traversing all blocks on the idom change find all relevant loads? Maybe you can actually make a good argument here too. Maybe you can argue in which blocks the load could be scheduled (all from early to the initial_mem_block ?), and so we must ensure the anti-dependency edges for any store in those blocks that could alias. Am I getting this right?
> If that is true, you could alternatively just search down through MergeMem and Phi, collecting all stores at the "leaves". But constrain the search to the blocks between early and initial_mem_block.
Sorry, I am lost here. I guess you mean relevant stores, not loads? There is a dominance relation initial_mem_block -> early -> LCA that always holds, and the load may (later, not as part of `PhaseCFG::insert_anti_dependences`) schedule between early and LCA (not between initial_mem_block and early). The only relevant anti-dependent stores for the load are between early and LCA, *but in order to find all such anti-dependent stores, it turns out we, in the general case, need to search starting from additional Phi memory states in between initial_mem and the early block*. Maybe my updated comments in the source code provide further clarification?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1920681050
More information about the hotspot-compiler-dev
mailing list