RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop
Benoît Maillard
bmaillard at openjdk.org
Fri Sep 12 07:25:06 UTC 2025
This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
### Context
The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
static public void test() {
x = 0;
for (int i = 0; i < 20000; i++) {
x += i;
}
x = 0;
}
After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
### Detailed Analysis
In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`. When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
This is what the IR looks like after the creation of the post loop in our reproducer:
<img width="1720" height="1908" alt="image" src="https://github.com/user-attachments/assets/ae074f45-4239-4664-99dd-ec247af39da5" />
On the screenshot, node `118 StoreI` takes directly `24 StoreI` as memory input, even though it is obvious that `96 CountedLoopEnd` (to which `73 NodeI` is attached) is a predecessor of `114 CountedLoopEnd` in the CFG.
After that, we observe a succession of IGVN optimizations that eventually lead to the generation of wrong code:
- The `IfFalse` projection of `128 If` becomes dead, as the the _post_ loop is always executed (number of iterations is known)
- `121 Region` and `123 Phi` are subsequently eliminated (as a result of the dead path)
- Because the `Phi` disappeared, `118 StoreI` becomes the memory input of `89 StoreI`
- `118 StoreI` is eliminated because it is directly followed by a write at the same memory location
- `89 StoreI` is replaced by `24 StoreI` as an `Identity` optimizations because it is stores the same value at the same location
Node `89 StoreI` corresponds to the last `x = 0` assignment, and its elimination directly causes the wrong result (the store node from the `OuterStripMinedLoop` remains, as it is used by the safepoint).
### Proposed Fix
As mentioned previously, the impact of the missing `Phi` nodes need to be investigated further, as it it likely that this causes other bugs in the compilation process. This is a "local fix" for the specific issue of `Store` nodes moved out of the inner loop.
The approach here is to do the wiring directly in `PhaseIdealLoop::insert_post_loop`, right after having done the usual rewiring based on the `Phi` nodes. As the conditions for moving `Store` nodes out of the loop are quite restrictive, the pattern is predictable: `Store` nodes are attached to the `false` projection of the inner `CountedLoopEnd`, right before the safepoint in the CFG.
In the simplest case, the memory input of new version of the store node is outside of the loop body. In the cloned node, we change it to point to its original version instead (as the original store is always executed before).
It may also be that the memory input of the new node points to another memory node in the loop body. This can happen in the case where we have:
for (int i = 0; i < 20000; i++) {
a1.field += i;
a2.field += i;
}
Here, the second store has the first one as memory input, as `a1` and `a2` may be aliases. In this case, we only need to change the memory input of the first store in the chain, and it needs to point to the last memory node in the chain in the original version of the loop.
### Testing
- [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8364757)
- [x] tier1-4, plus some internal testing
Thank you for reviewing!
-------------
Commit messages:
- Fix bad test headers after remaining
- Fix trailing whitespace
- Add jtreg tests
- Fix logic after failing TestStoresSunkInOuterStripMinedLoop
- 8364757: First attempt at fixing the store node issue
Changes: https://git.openjdk.org/jdk/pull/27225/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8364757
Stats: 141 lines in 3 files changed: 141 ins; 0 del; 0 mod
Patch: https://git.openjdk.org/jdk/pull/27225.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/27225/head:pull/27225
PR: https://git.openjdk.org/jdk/pull/27225
More information about the hotspot-compiler-dev
mailing list