RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2]

Daniel Lundén dlunden at openjdk.org
Wed Jan 8 15:24:39 UTC 2025


On Wed, 8 Jan 2025 14:08:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/gcm.cpp line 757:
>> 
>>> 755:   // In some cases, there are other relevant initial memory states besides
>>> 756:   // initial_mem. In such cases, we are rather dealing with multiple trees and
>>> 757:   // their fringes.
>> 
>> If I look at these comments here (I reviewed a change by Roland a few months back, so my memory is coming back)...
>> I see that the load is supposed to be scheduled before any `Memory state modifying nodes include Store and Phi` that is (transitively via any MergeMem) below the `initial_mem`.
>> 
>> In you example1, why do we therefore not put an anti-dependency edge betweeen the `183 load`, and the `106 Phi`? Would that not be enough to ensure the load is scheduled before the other memory affecting nodes further below `106 Phi`?
>> 
>> Or is the issue that this traversal is somehow restricted to blocks - I don't remember that from last time...
>> I'll keep reading the changes now.
>
> And in example 2, we should schedule before the Phi as well:
> ![image](https://github.com/user-attachments/assets/3d035602-fe4b-4c34-98fe-d2935fed92e0)
> 
> Why don't we do that?

Thanks for the comments @eme64!

> In you example1, why do we therefore not put an anti-dependency edge betweeen the 183 load, and the 106 Phi? Would that not be enough to ensure the load is scheduled before the other memory affecting nodes further below 106 Phi?
>
> Or is the issue that this traversal is somehow restricted to blocks - I don't remember that from last time...
I'll keep reading the changes now.

Yes, Phis only result in LCA changes at the block level and we never add anti-dependence edges directly between the load and Phi nodes. In example 1, we do mark the last block in between the path from 107 Phi to 106 Phi (which is B27) for raising the LCA. However, when doing the joint LCA raising operation later on (`raise_LCA_above_marks`), we start at the original LCA and stop when we reach the early block (B20). Therefore, we never even consider B27. My very first attempt at solving this issue was to try and identify some dominance relation between the early block and the blocks for and in between 107 Phi and 106 Phi and use this information to force the LCA to the early block. This kind of worked at the block level, but we still need to identify somehow that we need an anti-dependence edge to 64 membar_release. Otherwise, it can happen that the load is scheduled correctly in the early block, but incorrectly (after an overwriting store) within the block (easily verified with `-XX:
 +StressLCM`).

> And in example 2, we should schedule before the Phi as well:
Why don't we do that?

Same here as above, we do tag both B19 and B21 for raising the LCA, but never consider them in `raise_LCA_above_marks` since they are above the early block B9.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1907363858


More information about the hotspot-compiler-dev mailing list