RFR: JDK-8287284: C2: loop optimization performs split_thru_phi infinitely many times

Mon Oct 23 11:09:46 UTC 2023

### Node* phi = PhaseIdealLoop::split_thru_phi(Node* n, Node* region, int policy) 
The function `PhaseIdealLoop::split_thru_phi` splits Node `n` through a merge point (phi) if there is enough "win". First a clone x<sub>i</sub> of `n` is created for each input of the merge point. The general idea is that we want to replace enough of the x<sub>i</sub> clones with existing nodes or constants. The `policy` indicates how many "wins" are needed for success. A "win" is defined as one of the following:
- Identity(x<sub>i</sub>) has singleton type (i.e. a constant)
- Identity(x<sub>i</sub>) returns an already existing node
- x<sub>i</sub> commons up with an existing node (i.e. hash_find(x<sub>i</sub> != null)

_Example_
Looking at the phi (119 Phi) of `n` (135 LoadI) a clone of `n` is created for each input to the phi: 
- x<sub>1</sub> (140 LoadI) for the input [1] (127 Phi)
- x<sub>2</sub> (141 LoadI) for the input [2] (90 StoreI)

If there are enough "wins" (win $\geq$ policy) the wins are applied and a new merge point `phi` (139 Phi) is returned. 
`n` (135 LoadI) is replaced with `phi` (139 Phi) outside of `PhaseIdealLoop::split_thru_phi` if successful.

Example with **n =** `135 LoadI`, **region =** `129 Loop`, **policy =** `1` returns **phi =** `139 Phi`
<img width="286" alt="split_phi" src="https://github.com/openjdk/jdk/assets/71546117/4a1bcd8d-bbfd-4267-9d2b-5ba4e057df72">
Success because 1 "win" occurs: `88 AddI` = Identity(`141 LoadI`)
<img width="186" alt="88_identity" src="https://github.com/openjdk/jdk/assets/71546117/af3cb37a-c575-445a-b8d4-967013dd3fb0">

### run() has infinite loop of split_thru_phi

// java -Xbatch -XX:-PartialPeelLoop -XX:CompileCommand=compileonly,Test::run Test.java

public class Test {

    public static int cnt = 1;

    public static void run() {
        int j = 0;
        do {
            j = cnt;
            for (int k = 0; k < 20000; k++) {
                cnt += 2;
            }

        } while (++j < 10);
    }

    public static void main(String[] args) {
        for (int i = 0; i < 10; i++ ) {
            run();
        }
    }
}

`Compile::optimize_loops` ends up in an endless loop when optimizing run().
In each `PhaseIdealLoop iteration` a `LoadI` (e.g. 135) is split through a `Phi` (e.g. 119) and the newly created `LoadI` (e.g. 140) is again split through a `Phi` (e.g. 127) creating an endless repeating pattern of `LoadI` (e.g. 144) always connected to the circular dependent `Phis` (119, 127). This results in an every growing chain of new `Phis` (142, 139, 133). The process is only stopped because `LoopOptsCount` is reached after 43 iterations. The long chain of `Phis` stays throughout code generation.
<img width="596" alt="endless" src="https://github.com/openjdk/jdk/assets/71546117/5d0353a2-3212-403d-9f78-5bf60ac8df4e">

Lets look at the steps of `split_thru_phi` in more detail:

1. The graph of `run()` before the first `PhaseIdealLoop iterations` looks like the following:
![1 - init graph - split 135](https://github.com/openjdk/jdk/assets/71546117/b8663eb6-e10d-4450-9a48-e366c4b47f6e)

2. Running `build_and_optimize()`→`split_if_with_blocks()`→`split_if_with_blocks_pre(use)`→`split_thru_phi(Node* n, Node* region, int policy)` with **n =** `135 LoadI`, **region =** `139 Loop`. 
- `135 LoadI` is split through `119 Phi` if it results in a "win". 
- As a first step a clone of `135 LoadI` is created for each input of `119 Phi`:
   - clone `140 LoadI` for input `127 Phi`
   - `141 LoadI` for input `90 StoreI`
![2 - clone 140 141](https://github.com/openjdk/jdk/assets/71546117/e024e264-0b51-45da-9027-385d9750a8a3)

3. In this case we have a "win" because there exists an identity of `141 LoadI`:
- `88 AddI` = identity(`141 LoadI`)
![3 - 88=identity(141)](https://github.com/openjdk/jdk/assets/71546117/bb8a2e3e-1ccb-4ea7-b0fa-812436bd4cec)

4. As a final step of `split_thru_phi` (and only because we have a "win") `135 LoadI` is replaced with a new `139 Phi` that merges the clone `140 LoadI` and `88 AddI`, the identity of `141 LoadI` which gets cleaned up:
![4 - replace(135 with 139)](https://github.com/openjdk/jdk/assets/71546117/b861967e-9a2d-4056-bf9d-611377800a62)

5. Next `split_thru_phi(Node* n, Node* region, int policy)` is called with the previously created clone **n =** `140 LoadI`.
- `140 LoadI` is split through `127 Phi`
- Clone `143 LoadI` is created for `7 Parm`, the first input to `127 Phi`
![5 - split 140 and clone 143](https://github.com/openjdk/jdk/assets/71546117/ebbe989c-ffb9-4ad4-8f79-c1cb321c886a)

6. We have again a "win" for the clone `143 LoadI` because:
- `24 Loadl` = hash(`143 LoadI`)
![6 - 24=hash(143)](https://github.com/openjdk/jdk/assets/71546117/7bb9587e-8b28-4788-8a2f-cc1da97c2f14)

7. Next, clone `144 LoadI` is created for `119 Phi`, the second input to `127 Phi`. And finally, `140 LoadI` is replaced with a new `142 Phi` 
![7 - replace(140 with 142)](https://github.com/openjdk/jdk/assets/71546117/72f0fc45-5898-42d8-a506-0ca08fd1c068)
If `split_thru_phi` was called on a LoadI, region is a Loop and we have a "win" , we do `set_major_progress()` (at the end of `PhaseIdealLoop::split_if_with_blocks_pre`). Therefore we do another `PhaseIdealLoop iteration`.
https://github.com/openjdk/jdk/blob/903b9e8dd966fbb61222c59048b752ed8b42b608/src/hotspot/share/opto/loopopts.cpp#L1171-L1174

8. In the second `PhaseIdealLoop iteration` we call `split_thru_phi(Node* n, Node* region, int policy)` with **n =** `144 LoadI`. This is the same as in the first iteration: `44 LoadI` has the same inputs as `135 LoadI` in the first iteration. We find the exact same wins again because the involved nodes have not changed in the meantime. We repeat the work of the first iteration and `set major_progress` to true. For each following loop opts round, we repeat the same steps again and keep setting `major_progress` to true until we reach the maximum number of loop opts iterations.

=> **We are stuck in a loop!**
![8 - split 144](https://github.com/openjdk/jdk/assets/71546117/ba340d10-5031-4b30-b97b-63793fc65118)

9. After another call to `split_thru_phi` we have a chain of 4 phis that grows until `PhaseIdealLoop iterations` runs out of iterations.
![9 - chain of phis](https://github.com/openjdk/jdk/assets/71546117/c282b4d7-57af-4fa4-a90d-6d69afba52ed)

# Proposed Fix

The problem is that `split_thru_phi` finds a win when in reality it is not a win: it does not make much sense to move the LoadI back into the inner loop - Because that means adding extra work to the most common path. The cost function in `split_thru_phi` is now adjusted to exclude such false wins. This automatically also prevents the infinite loop.

---------
### Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjdk.org/bylaws#reviewer))
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue

### Reviewing
<details><summary>Using <code>git</code></summary>

Checkout this PR locally: \
`$ git fetch https://git.openjdk.org/jdk.git pull/15536/head:pull/15536` \
`$ git checkout pull/15536`

Update a local copy of the PR: \
`$ git checkout pull/15536` \
`$ git pull https://git.openjdk.org/jdk.git pull/15536/head`

</details>
<details><summary>Using Skara CLI tools</summary>

Checkout this PR locally: \
`$ git pr checkout 15536`

View PR using the GUI difftool: \
`$ git pr show -t 15536`

</details>
<details><summary>Using diff file</summary>

Download this PR as a diff file: \
<a href="https://git.openjdk.org/jdk/pull/15536.diff">https://git.openjdk.org/jdk/pull/15536.diff</a>

</details>

-------------

Commit messages:
 - moved logic to a function
 - remove comment
 - Update loopnode.hpp
 - Never move a LoadI from outer to inner loop
 - revert
 - JDK-8287284

Changes: https://git.openjdk.org/jdk/pull/15536/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15536&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8287284
  Stats: 33 lines in 2 files changed: 31 ins; 1 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/15536.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15536/head:pull/15536

PR: https://git.openjdk.org/jdk/pull/15536