RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop

Tue Sep 16 09:14:52 UTC 2025

On Tue, 16 Sep 2025 08:30:05 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
>> 
>> ### Context
>> 
>> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
>> 
>> 
>>     static public void test() {
>>         x = 0;
>>         for (int i = 0; i < 20000; i++) {
>>             x += i;
>>         }
>>         x = 0;
>>     }
>> 
>> 
>> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
>> 
>> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
>> 
>> ### Detailed Analysis
>> 
>> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
>> 
>> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
>> 
>> This is wh...
>
> src/hotspot/share/opto/loopTransform.cpp line 1692:
> 
>> 1690:   }
>> 1691:   return out;
>> 1692: }
> 
> Note from later me: I was quite confused here. I thought this was going to be some general function that should handle all sorts of memory flow in the loop, but that is not the case. I'll leave all my comments here just to show you what I as the reader thought when reading it ;)
> 
> Below, in a code comment you say that this method does:
> `Find the last memory node in the loop when following memory usages`
> 
> What happens here if we hit an if-diamond (or more complicated), where there can be multiple memory uses, that are then merged again by a memory phi?
> 
> 
> store
>  |
>  +--------+
>  |        |
> store   store
>  |        |
>  +---+ +--+
>      | |
>      phi
>       |
>     store -> the last one in the loop
> 
> I wonder if this is somehow possible. There are surely some IGVN optimizations that would common the stores here, and so the graph would probably have to be even more complicated. But I'm simply wondering if it could be possible that we would have branches / phis in the memory graph. Or what guarantees us that the graph is really linear here?
> 
> I'm also not sure how to parse the method name:
> `find_mem_out_outer_strip_mined`
> - find "mem out" <something> outer-strip-mined <loop?>
> - find mem outside of outer-strip-mined loop?

I suppose we would trigger your assert if we found a branch:
`assert(unique_next == nullptr, "memory node should only have one usage in the loop body");`

Now we usually only do pre-main-post for relatively small loop bodies, see `LoopUnrollLimit`. But I wonder if we ever decided to increase this limit, would we then encounter such more complicated memory graphs?

> src/hotspot/share/opto/loopTransform.cpp line 1794:
> 
>> 1792:       Node* store = if_false->out(j)->isa_Store();
>> 1793:       // We don't make changes if the memory input is in the loop body as well
>> 1794:       if (store && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {
> 
> Suggestion:
> 
>       if (store != nullptr && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {
> 
> No implicit null or zero checks, see hotspot style guide ;)

The loop nesting check looks a bit convoluted. Consider refactoring a little. Could you get rid of the `!` by swapping things around?
`get_loop(get_ctrl(store->in(MemNode::Memory))))->is_member(outer_loop)`
Does not look that much better either... hmm.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351468893
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351511009