RFR: 8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop

Tue Apr 23 07:42:59 UTC 2024

Summary: the address `adr` of the vector we want to align the main-loop for has a `CastLL` after the pre-loop and before the main-loop. When we use this address to adjust the pre-loop limit, we create a use before the `CastLL`, which leads to a "bad dominance" assert. Solution: make sure that all such base addresses `adr` are not just invariant in the main-loop, but also are invariant of/before the pre-loop.

**Example where we get the "bad dominance"**

This code shape comes from the attached regression tests (no matter if with Unsafe or MemorySegment).

The loop is PreMainPost-ed and the main-loop unrolled. `1326 CountedLoop` is the pre-loop, and `1657 CountedLoop` is the main-loop, which contains the `1648 LoadI`. During `SuperWord`, we take this load's address to align the main-loop.

The address is parsed into its components by `VPointer`:
`VPointer[mem: 1648      LoadI, base:    1, adr: 1669,  base[   1] + offset(   0) + invar(   0) + scale(   4) * iv]`

We note that this is the access to native memory via Unsafe / MemorySegment, and so there is no array pointer base, and the `base = 1 TOP`. `VPointer` tries still to find a "base" adress `adr` by parsing the very left-most input to the chain of `AddP`s. Here, there is only a single `1711 AddP`, and the left input is `adr = 1669 CastX2P`. The right side of the `AddP` is also parsed, and determined to be `4 * iv`.

The problematic part: `1669 CastX2P` is "pinned" down below the pre-loop by the `1513CastLL` that is applied to `11 Parm` (= `long offset` parameter in the test).

![image](https://github.com/openjdk/jdk/assets/32593061/d5579226-797c-489e-8aa1-0c906ca59755)

During `SuperWord`, we want to align the main-loop vectors. We do this by adjusting the pre-loop limit `1439 Opaque1`:

![image](https://github.com/openjdk/jdk/assets/32593061/23bafa67-1438-4057-88a7-fb72e8b06c5c)

You can see the dark-green IR nodes, which compute the `new_limit = old_limit + adjustment`, where the adjustment is a modulo `1734 AndI` of the address of the `1738 LoadVector` for which we are aligning. In this computation we are also using the `adr` of our `VPointer`, which depends on the `1513 CastLL` which is pinned below the pre-loop. Thus, we are using a node inside the pre-loop that is pinned after the pre-loop. Hence the "bad dominance" assert.

**Why does this happen?**

Usually, the `base` and/or `adr` of a `VPointer` are invariant not just of the main-loop but also the pre-loop. The pinning after pre-loop and before main-loop via `1513 CastLL` seems to be quite rare. The `CastLL` comes from the long `checkIndex` for the Unsafe / MemorySegment load, somehow in combination with the constant-size array / constant range of the main-loop.

Another realization: the `adr` is basically an addition `raw_base + offset`. I would have expected the `offset` to be an invariant and end up in the `invar`. But the `VPointer` parsing seems to only parse through the `AddP`, and stop at the `CastX2P`, and take this as the `adr`, even though we could parse further through the `1590 AddL`, and separate the `11 Parm = long offset` and the `602 LoadL = raw_base`.

**Solution**

For now, and for the simplicity of backports, I simply check that the `adr` is not just `is_loop_member` (of the main-loop), but that it is `invariant` (of the main-loop and the pre-loop). We also already do this check for invariants `invar`, and the base/adr is essencially another invariant. That solution means that in our regression tests, we would mark the problematic `VPointer`s as not valid, and they will not be vectorized. This case is very rare (after all, we have never hit this bad-dominance assert before), and therefore there should not be any relevant performance regression.

In a **future RFE**, I can then look into improving the `VPointer` parsing. The `VPointer` code is already very convoluted, and adding much more functionality will be difficult to manage. I want to completely refactor `VPointer` in a few months anyway. With improved parsing, this regression test could vectorize again.

-------------

Commit messages:
 - fix and more test runs
 - ms test
 - 8330819

Changes: https://git.openjdk.org/jdk/pull/18892/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18892&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8330819
  Stats: 163 lines in 3 files changed: 162 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/18892.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18892/head:pull/18892

PR: https://git.openjdk.org/jdk/pull/18892