RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop

Emanuel Peter epeter at openjdk.org
Tue Sep 16 09:14:50 UTC 2025


On Thu, 11 Sep 2025 13:05:21 GMT, Benoît Maillard <bmaillard at openjdk.org> wrote:

> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
> 
> ### Context
> 
> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
> 
> 
>     static public void test() {
>         x = 0;
>         for (int i = 0; i < 20000; i++) {
>             x += i;
>         }
>         x = 0;
>     }
> 
> 
> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
> 
> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
> 
> ### Detailed Analysis
> 
> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
> 
> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
> 
> This is what the IR looks like after the creation of the post lo...

Thanks for working on this @benoitmaillard !

And thanks for all the explanations.

It seems the missing Phi at the OuterStripMinedLoop are a decision that implies that Stores will just sort of "hang" between loop exit and SafePoint. That is now the new "invariant". Fine for now, but we may want to reconsider adding the Phi for the OuterStripMinedLoop eventually.

I have read through the PR, and was a little confused about names, so bear with my comments 😅 

On the algo level I was wondering if it is possible to have a chain of stores between the exit and SafePoint? Do you have such examples?

src/hotspot/share/opto/loopTransform.cpp line 1679:

> 1677:       Node* next = out->fast_out(l);
> 1678:       if (next->is_Mem() && next->in(MemNode::Memory) == out) {
> 1679:         IdealLoopTree* output_loop = get_loop(get_ctrl(next));

I would keep the names for `next` and `output_loop` consistent. Maybe `next_loop`? Or just call them `use` and `use_loop`?

src/hotspot/share/opto/loopTransform.cpp line 1692:

> 1690:   }
> 1691:   return out;
> 1692: }

Note from later me: I was quite confused here. I thought this was going to be some general function that should handle all sorts of memory flow in the loop, but that is not the case. I'll leave all my comments here just to show you what I as the reader thought when reading it ;)

Below, in a code comment you say that this method does:
`Find the last memory node in the loop when following memory usages`

What happens here if we hit an if-diamond (or more complicated), where there can be multiple memory uses, that are then merged again by a memory phi?


store
 |
 +--------+
 |        |
store   store
 |        |
 +---+ +--+
     | |
     phi
      |
    store -> the last one in the loop

I wonder if this is somehow possible. There are surely some IGVN optimizations that would common the stores here, and so the graph would probably have to be even more complicated. But I'm simply wondering if it could be possible that we would have branches / phis in the memory graph. Or what guarantees us that the graph is really linear here?

I'm also not sure how to parse the method name:
`find_mem_out_outer_strip_mined`
- find "mem out" <something> outer-strip-mined <loop?>
- find mem outside of outer-strip-mined loop?

src/hotspot/share/opto/loopTransform.cpp line 1788:

> 1786:   // right after the execution of the inner CountedLoop.
> 1787:   // We have to make sure that such stores in the post loop have the right memory inputs from the main loop
> 1788:   if (loop->tail()->in(0)->is_BaseCountedLoopEnd()) {

Out of curiosity: when would this condition be false?

src/hotspot/share/opto/loopTransform.cpp line 1793:

> 1791:     for (DUIterator j = if_false->outs(); if_false->has_out(j); j++) {
> 1792:       Node* store = if_false->out(j)->isa_Store();
> 1793:       // We don't make changes if the memory input is in the loop body as well

Why? I suppose that is because there must be a Phi in the loop then, right? Maybe state that in the comment here.

src/hotspot/share/opto/loopTransform.cpp line 1794:

> 1792:       Node* store = if_false->out(j)->isa_Store();
> 1793:       // We don't make changes if the memory input is in the loop body as well
> 1794:       if (store && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {

Suggestion:

      if (store != nullptr && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {

No implicit null or zero checks, see hotspot style guide ;)

src/hotspot/share/opto/loopTransform.cpp line 1797:

> 1795:         Node* mem_out = find_mem_out_outer_strip_mined(store, outer_loop);
> 1796:         Node* store_new = old_new[store->_idx];
> 1797:         store_new->set_req(MemNode::Memory, mem_out);

Could it be that there are multiple stores in a chain after the loop exit and before the SafePoint?

Loop
Exit
store1
store2
store3
SafePoint

If so, they all have the same control, namely at the `if_false`.
Their memory state should be ordered, where store2 depends on store1 and store3 on store2. Only store1 should then really have its memory input updated.

Your code now finds the `store_new` for each of store1, store2 and store3, and sets all of their memory inputs to `mem_out`. But that means that the "new" stores all have the same memory input, and are not in a chain any more. Did I see this right? Is that ok?

src/hotspot/share/opto/loopnode.hpp line 1384:

> 1382: 
> 1383:   // Find the last memory node in the loop when following memory usages
> 1384:   Node *find_mem_out_outer_strip_mined(Node* store, IdealLoopTree* outer_loop);

The name of the method is a bit confusing. And the comment seems to suggest something different than what the code says.

test/hotspot/jtreg/compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java line 77:

> 75:         a1.field = 0;
> 76:         a2.field = 0;
> 77:     }

Do the field stores both float out of the loop, and end up in a chain between exit and safepoint? Might be nice to add some comments to these tests so we can see what examples you already cover and if we might need some more.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27225#pullrequestreview-3228308675
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351475787
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351447559
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351489419
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351520064
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351496858
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351551611
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351410690
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351608304


More information about the hotspot-compiler-dev mailing list