RFR: 8347018: C2: assert(find_block_for_node(self->in(0)) == early) failed: The home of a memory writer must also be its earliest placement

Christian Hagedorn chagedorn at openjdk.org
Mon Jan 13 14:06:10 UTC 2025


## Failing Assert

The failing assert in `PhaseCFG::schedule_late()` checks the following:
https://github.com/openjdk/jdk/blob/55c6904e8f3d02530749bf28f2cc966e8983a984/src/hotspot/share/opto/gcm.cpp#L1431-L1438

In the test case, this is violated for `87 storeI`:
![image](https://github.com/user-attachments/assets/6844e060-69c4-436e-ac47-c14ec59ee4d2)

The early block `early` for `87 storeI` is bound by `115 loadI` pinned at `161 Region` which is dominated by the control input `146 Region` of `87 storeI`. This lets the assert fail.

## How Did `115 loadI` End up Being Pinned below `87 storeI`?
### Before Pre/Main/Post Loop Creation
Before the creation of pre/main/post loops, we have the following graph:

![image](https://github.com/user-attachments/assets/3f6e6b92-5194-4efd-89a9-246a977e3022)

Everything looks fine: The control input of `312 StoreI` (which is eventually cloned and becomes `87 storeI` in the Mach graph) corresponds to the early placement of the store. `415 LoadI` was hoisted out of the loop during Loop Predication and is pinned above at a Template Assertion Predicate.

### Pre/Main/Post Loop Creation
#### Post Loop Body Creation
During the creation of pre/main/post loops, we clone the main loop body for the post loop body:

![image](https://github.com/user-attachments/assets/e8de3d6d-34c3-46fd-abd4-df212e2e77fb)

We notice that `312 StoreI` is pinned on the main loop backedge. When finishing the last iteration from the main loop and possibly continuing in the post loop, we need to feed everything on the loop backedge of the main loop to the post loop. However, the pinned nodes on the main loop backedge cannot float. Therefore, we need to create new copies of these pinned nodes with `PhaseIdealLoop::clone_up_backedge_goo()`. 

The pins are updated to the entry of the post loop. All inputs into these pinned nodes that have their current control (fetched with `get_ctrl()`) on the main loop backedge as well are also cloned but keep their control inputs (if any) if it's not the loop backedge.

In our example, this applies to `453 StoreI` -> `479 StoreI`, and some inputs recursively (`454 AddI` -> `482 AddI`, `481 LoadI` -> `541 Load`):

![image](https://github.com/user-attachments/assets/0d4b818b-5c23-4ad9-b474-976c352f4c88)

Still, all looks fine. Notice that the clone `481 LoadI` of `455 LoadI` is currently still pinned at the same Template Assertion.

#### Assertion Predicate Creation
In the next step, we create new Assertion Predicates at the post loop and rewire any data nodes control dependent on Assertion Predicates down to the post loop - including the new `481 LoadI` from `PhaseIdealLoop::clone_up_backedge_goo()`:

![image](https://github.com/user-attachments/assets/da529f25-de01-4ebc-a051-e9a8132f92d3)

This creates the graph shape with which we are then later failing during scheduling in the backend: The control input of `479 StoreI` further up in the graph as the actual early block limited by `481 LoadI` pinned at `493 IfTrue`.

## Same Problem with `clone_up_backedge_goo()` for Main Loop?
The very same problem could theoretically also be observed for the main loop when creating the pre loop. But it is not due to how we implemented the rewiring of data nodes when creating new Assertion Predicates:

After the pre loop is created, the old Assertion Predicates are above the pre loop and actually need to be established at the main loop. Therefore, all data nodes control dependent on Assertion Predicates and belonging to the main loop need to be rewired. In our test case, this is `415 LoadI` (original node) and `540 LoadI` (cloned node by `clone_up_backedge_goo()` actually belonging to main loop):

![image](https://github.com/user-attachments/assets/5898aa1a-fc38-45f8-9573-5b3c982202b3)

### Check If Data Belongs to Main Loop
Since the pre loop only contains cloned nodes we do the following trick to determine if a node belongs to the main loop (implemented [here](https://github.com/chhagedorn/jdk/blob/3b9732edc6dd22868634166678d220bf1066e5be/src/hotspot/share/opto/predicates.hpp#L964-L973)):

Store index IDX for the next newly created node just before pre loop creation.
For any data node dependency n:
  Is index of n < IDX? -> Not a node in the pre loop
    Is there a clone of n with index >= IDX? -> Clone is in pre loop and thus original node in main loop  

### Cloned Nodes with `clone_up_backedge_goo()` Mess with "Node inside Main Loop" Check
Since the cloned nodes in `clone_up_backedge_goo()` are originally from pre loop nodes, our check will fail and we do not rewire these nodes, even though they belong to the main loop:

"540 LoadI < IDX" does not hold
=> we conclude 540 LoadI is a cloned node belonging to the pre loop and not the main loop 


Applied to our test case, we have the following after `clone_up_backedge_goo()`:

![image](https://github.com/user-attachments/assets/2216f446-cef3-4f66-80a1-f07012160c59)

We can see that `540 LoadI`, cloned by `clone_up_backedge_goo()`, is still pinned before the pre loop because we have not rewired it and thus scheduling does not fail with the assert. 

Even though I could not trigger a failure, I think it is an incorrect pin since the `540 LoadI` belongs to the main loop.

## Proposed Fix
- Rewire any nodes created by `clone_up_backedge_goo()` which are pinned to the original loop entry before Assertion Predication to the new loop entry after Assertion Predicate creation. The new loop entry will be the the tail of the last Assertion Predicate (if any).
- Update data node rewiring in Assertion Predication processing to also consider nodes from `clone_up_backedge_goo()` correctly. I've implemented a new `NodeInMainLoopBody` class for that purpse.

### Why not just Add Assertion Predicates First?
This does not work straight forward because we do not know the init value before applying `clone_up_backedge_goo()` which is interleaved with updating the phi nodes. I've decided to go with the proposed fix instead. 

## Testing:
- tier1-7
- hs-precheckin-comp
- hs-comp-stress

## Deferring to JDK 25?
This seems to be an edge case (only found with fuzzing) and it's not entirely clear to me what the impact on product builds is. However, this is a regression in JDK 24 and should be considered to be fixed in JDK 24. But this fix became somewhat more complex to understand and implement. First applying the fix to JDK 25, letting it bake and then only considering it for an update release of JDK 25 could be a possible option I think. Opinions are welcomed.

Thanks,
Christian

-------------

Commit messages:
 - New NodeInMainLoopBody
 - 8347018: C2: assert(find_block_for_node(self->in(0)) == early) failed: The home of a memory writer must also be its earliest placement

Changes: https://git.openjdk.org/jdk/pull/23071/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23071&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8347018
  Stats: 154 lines in 4 files changed: 134 ins; 9 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/23071.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23071/head:pull/23071

PR: https://git.openjdk.org/jdk/pull/23071


More information about the hotspot-compiler-dev mailing list