RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges
Roberto Castañeda Lozano
rcastanedalo at openjdk.org
Mon Dec 23 11:13:43 UTC 2024
On Fri, 20 Dec 2024 18:26:55 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:
> When searching for load anti dependences in GCM, it is not always sufficient to just search starting at the direct initial memory input to the load. Specifically, there are cases when we must also search for anti dependences starting at relevant Phi memory nodes in between the load's early block and the initial memory input's block. Here, "in between" refers to blocks in the dominator tree in between the early and initial memory blocks.
>
> #### Example 1
>
> Consider the ideal graph below. The initial memory for 183 loadI is 107 Phi and there is an important anti dependency for node 64 membar_release. To discover this anti dependency, we must rather search from 119 Phi which contains overlapping memory slices with 107 Phi. Looking at the ideal graph block view, we see that both 107 Phi and 119 Phi are in the initial memory block (B7) and thus dominate the early block (B20). If we only search from 107 Phi, we fail to add the anti dependency to 64 membar_release and do not force the load to schedule before 64 membar_release as we should. In the block view, we see that the load is actually scheduled in B24 _after_ a number of anti-dependent stores, the first of which is in block B20 (corresponding to the anti dependency on 64 membar_release). The result is the failure we see in this issue (we load the wrong value).
>
> 
> 
>
> #### Example 2
>
> There are also situations when we need to start searching from Phis that are strictly in between the initial memory block and early block. Consider the ideal graph below. The initial memory for 100 loadI is 18 MachProj, but we also need to search from 76 Phi to find that we must raise the LCA to the last block on the path between 76 Phi and 75 Phi: B9 (= the load's early block). If we do not search from 76 Phi, the load is again likely scheduled too late (in B11 in the example) after anti-dependent stores (the first of which corresponds to 58 membar_release in B10). Note that the block B6 for 76 Phi is strictly dominated by the initial memory block B2 and also strictly dominates the early block B9.
>
> 
> 
>
> ### Changeset
>
> - Update `PhaseCFG::insert...
Good analysis, Daniel! Given the presence of overlapping memory phis (memory phis that are placed in the same block and include aliasing memory slices), the general idea of this fix seems reasonable to me. As a more fundamental solution, it would be worth investigating (perhaps separately) the root cause of this overlap and exploring whether it is feasible to enforce disjointness (an invariant apparently assumed by the original `PhaseCFG::insert_anti_dependences` algorithm), at least during code generation.
Does the comment above the definition of `initial_mem` require any update as part of this change?
src/hotspot/share/opto/gcm.cpp line 773:
> 771: worklist_def_use_mem_states.push(nullptr, initial_mem);
> 772: Block* initial_mem_block = get_block_for_node(initial_mem);
> 773: if (load->in(0) && initial_mem_block != nullptr) {
Suggestion:
if (load->in(0) != nullptr && initial_mem_block != nullptr) {
src/hotspot/share/opto/gcm.cpp line 773:
> 771: worklist_def_use_mem_states.push(nullptr, initial_mem);
> 772: Block* initial_mem_block = get_block_for_node(initial_mem);
> 773: if (load->in(0) && initial_mem_block != nullptr) {
What would be the conditions needed for `initial_mem_block == nullptr`? (I did a quick run with an additional assertion and could not find any). It would be great to narrow down this to understand better the completeness of the fix and convince ourselves we are not leaving interesting cases unaddressed.
src/hotspot/share/opto/gcm.cpp line 778:
> 776: // block. If we in a block find memory Phi(s) that can alias initial_mem,
> 777: // these are also potential initial memory states and there may be further
> 778: // required anti dependences due to them.
For consistency with the rest of `gcm.cpp`:
Suggestion:
// required anti-dependences due to them.
src/hotspot/share/opto/gcm.cpp line 783:
> 781: // Stop searching when we run out of dominators (b == nullptr) or when we
> 782: // step past the initial memory block (b == initial_mem_block->_idom).
> 783: while (b != nullptr && b != initial_mem_block->_idom) {
Can `b` be `nullptr` here? The highest up we can go in the dominator tree is the start block, whose immediate dominator is the root block, no?
src/hotspot/share/opto/gcm.cpp line 785:
> 783: while (b != nullptr && b != initial_mem_block->_idom) {
> 784: if (b == initial_mem_block && !initial_mem->is_Phi()) {
> 785: break;
Could you add a brief code comment here explaining why the early break?
src/hotspot/share/opto/gcm.cpp line 788:
> 786: }
> 787: for (uint i = 0; i < b->number_of_nodes(); ++i) {
> 788: Node* n = b->get_node(i);
For sanity (and efficiency), we might want to (in a separate RFE, and if at all possible) make GCM insert Phi nodes to blocks before any Mach node, and add an early break here. Meanwhile, could you add a comment here explaining that we have to traverse the entire block because Phi nodes might be interleaved with Mach nodes as LCM may not have run yet?
src/hotspot/share/opto/gcm.cpp line 790:
> 788: Node* n = b->get_node(i);
> 789: if (n->is_memory_phi() && C->can_alias(n->adr_type(), load_alias_idx)) {
> 790: worklist_def_use_mem_states.push(nullptr, n);
We may push `<nullptr, initial_mem>` again here, is that an issue? If not, maybe add a comment explaining why.
src/hotspot/share/opto/gcm.cpp line 799:
> 797: def_mem_state = use_mem_state; // It's not a possibly interfering store.
> 798: if (use_mem_state == initial_mem)
> 799: initial_mem = nullptr; // only process initial memory once
Could you explain these changes?
-------------
PR Review: https://git.openjdk.org/jdk/pull/22852#pullrequestreview-2520358408
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1895602921
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1895606225
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1895608702
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1895612033
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1895612765
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1895616119
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1895617636
PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1895618712
More information about the hotspot-compiler-dev
mailing list