RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2]
Emanuel Peter
epeter at openjdk.org
Thu Jan 9 12:58:55 UTC 2025
On Tue, 7 Jan 2025 18:03:21 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:
>> When searching for load anti dependences in GCM, it is not always sufficient to just search starting at the direct initial memory input to the load. Specifically, there are cases when we must also search for anti dependences starting at relevant Phi memory nodes in between the load's early block and the initial memory input's block. Here, "in between" refers to blocks in the dominator tree in between the early and initial memory blocks.
>>
>> #### Example 1
>>
>> Consider the ideal graph below. The initial memory for 183 loadI is 107 Phi and there is an important anti dependency for node 64 membar_release. To discover this anti dependency, we must rather search from 119 Phi which contains overlapping memory slices with 107 Phi. Looking at the ideal graph block view, we see that both 107 Phi and 119 Phi are in the initial memory block (B7) and thus dominate the early block (B20). If we only search from 107 Phi, we fail to add the anti dependency to 64 membar_release and do not force the load to schedule before 64 membar_release as we should. In the block view, we see that the load is actually scheduled in B24 _after_ a number of anti-dependent stores, the first of which is in block B20 (corresponding to the anti dependency on 64 membar_release). The result is the failure we see in this issue (we load the wrong value).
>>
>> ![failure-graph-1](https://github.com/user-attachments/assets/e5458646-7a5c-40e1-b1d8-e3f101e29b73)
>> ![failure-blocks-1](https://github.com/user-attachments/assets/a0b1f724-0809-4b2f-9feb-93e9c59a5d6a)
>>
>> #### Example 2
>>
>> There are also situations when we need to start searching from Phis that are strictly in between the initial memory block and early block. Consider the ideal graph below. The initial memory for 100 loadI is 18 MachProj, but we also need to search from 76 Phi to find that we must raise the LCA to the last block on the path between 76 Phi and 75 Phi: B9 (= the load's early block). If we do not search from 76 Phi, the load is again likely scheduled too late (in B11 in the example) after anti-dependent stores (the first of which corresponds to 58 membar_release in B10). Note that the block B6 for 76 Phi is strictly dominated by the initial memory block B2 and also strictly dominates the early block B9.
>>
>> ![failure-graph-2](https://github.com/user-attachments/assets/ede0c299-6251-4ff8-8b84-af40a1ee9e8c)
>> ![failure-blocks-2](https://github.com/user-attachments/assets/e5a87e43-b6fe-4fa3-8961-54752f63633e)
>>
>> ### Cha...
>
> Daniel Lundén has updated the pull request incrementally with one additional commit since the last revision:
>
> Updates after comments
I'm getting closer to understanding what you are doing 😅
I have some more questions and suggestions.
I think @rwestrel should also have a look at this, he has recently fixed a bug in this code.
src/hotspot/share/opto/gcm.cpp line 781:
> 779: // If the load has an explicit control input, walk up the dominator tree
> 780: // from the early block (inclusive) to the initial memory block
> 781: // (inclusive). If we in a block find memory Phi(s) that can alias
"If we in a block find" sounds a little strange.
Suggestion:
// (inclusive). When traversing the blocks, we look for Phi(s) that can alias
src/hotspot/share/opto/gcm.cpp line 789:
> 787: // initial_mem_block->_idom). The loop below always terminates because the
> 788: // root block strictly dominates initial_mem_block.
> 789: while (b != initial_mem_block->_idom) {
Could you write a `for` instead?
`for(Block* b = early; b != initial_mem_block->_idom; b = b->_idom) {`
Having the initial, exit-check and iteration-step together makes it a little more readable, I think.
src/hotspot/share/opto/gcm.cpp line 793:
> 791: if (b == initial_mem_block && !initial_mem->is_Phi()) {
> 792: // If we are in the initial memory block, and initial_mem is not itself
> 793: // a Phi, no Phis in the block can be initial memory states.
I'm confused when I read this. As said above, we need a clear definition of `initial`.
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/22852#pullrequestreview-2539792186
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908705090
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908737344
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908729513
More information about the hotspot-compiler-dev
mailing list