RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v5]
Daniel Lundén
dlunden at openjdk.org
Tue Jan 21 10:35:39 UTC 2025
On Tue, 21 Jan 2025 06:21:33 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> Daniel Lundén has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Add example in comment
>
> Ah I see the logic here, currently, we just lift the load above a `Phi` regardless the liveness of the memory at that `Phi`, this covers all the `Phi`s from `early` to `LCA`, and this PR searches the remaining `Phi`s from `memory_block` to `early`. That seems to me will cover all the cases, albeit in an overly conservative manner.
Thanks for the comments @merykitty!
> IIUC the issue is that an anti-dependency search stops at PhiNodes while it must look through it because the memory is not dead at that Phi but it is killed later. The fact that the memory is alive at the Phi means that we do not make the load anti-dependent on the Phi but since we stop searching we fail to make the load anti-dependent on the memory kill later.
Yes, sounds about right. I had a similar point of view when developing the [alternative changeset](https://github.com/openjdk/jdk/compare/master...dlunde:jdk:insert-anti-dependences-8333393+rewrite) that I've mentioned earlier in this PR. In this alternative solution, I find the missing anti-dependences by expanding the memory graph search from `initial_mem`, without using any dominator information. Two particular challenges:
1. Searching through all Phis can lead to incorrect anti-dependence edges that cause circular dependencies in the `early` block. We need additional analysis to determine which Phis we can safely search through.
2. It is not enough to just search the memory graph downwards from `initial_mem`. In cases where `intial_mem` is a Phi, we must also search upwards to find all possible actual initial memory states (and then go downwards from them as well).
I can go into more detail if you wish. Taking care of the two items above is slow compared to the present solution.
> However, I cannot see the relation between the Phis we need to search and the scope you are searching. What does the early block have anything to do with the memory graph here? The early block takes into consideration the control input and the pointer input, too, which seems unrelated to the memory graph.
What we are doing is finding memory nodes that, in the context of the load we are looking at, are equivalent to `initial_mem`. That is, we could set the memory input (`initial_mem`) to the load to any of these memory nodes, and the graph would be semantically equivalent. The latest possible definition of `initial_mem` is in the `early` block, so going below that is unnecessary (maybe even incorrect, I need to think about it).
> For example, in your example 2, if I just arbitrarily remove the control input of 100 loadI, then how can your solution manage to find 76 Phi to do anti-dependence search on?
In that case, the early block is equal to the initial memory block, B2. We stop the search at 76 Phi and 77 Phi, and hoist the LCA above the last blocks on the edges (18 MachProj -> 76 Phi), (78 MergeMem -> 77 Phi), and (80 MergeMem -> 77 Phi). Looking at the CFG, the blocks are B5, B19, and B21. Their LCA is B4, so the load schedulable range is then between B2 and B4.
> I feel that in such cases the load is still anti-dependent on 58 membar_release, right?
Yes, but changing the `early` block changes everything. Now stopping at 76 Phi and 77 Phi actually has the intended effect. Very conservative of course (the optimal LCA is still B9), but at least correct.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22852#issuecomment-2604334088
More information about the hotspot-compiler-dev
mailing list