RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop
Benoît Maillard
bmaillard at openjdk.org
Mon Sep 22 15:26:06 UTC 2025
On Tue, 16 Sep 2025 08:53:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
>>
>> ### Context
>>
>> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
>>
>>
>> static public void test() {
>> x = 0;
>> for (int i = 0; i < 20000; i++) {
>> x += i;
>> }
>> x = 0;
>> }
>>
>>
>> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
>>
>> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
>>
>> ### Detailed Analysis
>>
>> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`. When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
>>
>> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
>>
>> This is wh...
>
> src/hotspot/share/opto/loopTransform.cpp line 1797:
>
>> 1795: Node* mem_out = find_mem_out_outer_strip_mined(store, outer_loop);
>> 1796: Node* store_new = old_new[store->_idx];
>> 1797: store_new->set_req(MemNode::Memory, mem_out);
>
> Could it be that there are multiple stores in a chain after the loop exit and before the SafePoint?
>
> Loop
> Exit
> store1
> store2
> store3
> SafePoint
>
> If so, they all have the same control, namely at the `if_false`.
> Their memory state should be ordered, where store2 depends on store1 and store3 on store2. Only store1 should then really have its memory input updated.
>
> Your code now finds the `store_new` for each of store1, store2 and store3, and sets all of their memory inputs to `mem_out`. But that means that the "new" stores all have the same memory input, and are not in a chain any more. Did I see this right? Is that ok?
Yes, this can happen. This is actually what we test with the last test case (`test3`), and this is why we have the following:
```c++
// We don't make changes if the memory input is in the loop body as well
if (store && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {
In this case, the condition is only true for `store1` (as its memory input would be last memory operation before the loop, or the memory `Parm`), but not for `store2` nor `store3`. We would only end up rewiring `store1`, and leave `store2` and `store3` as they are.
Does that make sense?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2368979720
More information about the hotspot-compiler-dev
mailing list