RFR: 8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop [v2]

Tue Apr 23 13:32:37 UTC 2024

On Tue, 23 Apr 2024 08:52:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Summary: the address `adr` of the vector we want to align the main-loop for has a `CastLL` after the pre-loop and before the main-loop. When we use this address to adjust the pre-loop limit, we create a use before the `CastLL`, which leads to a "bad dominance" assert. Solution: make sure that all such base addresses `adr` are not just invariant in the main-loop, but also are invariant of/before the pre-loop.
>> 
>> **Example where we get the "bad dominance"**
>> 
>> This code shape comes from the attached regression tests (no matter if with Unsafe or MemorySegment).
>> 
>> The loop is PreMainPost-ed and the main-loop unrolled. `1326 CountedLoop` is the pre-loop, and `1657 CountedLoop` is the main-loop, which contains the `1648 LoadI`. During `SuperWord`, we take this load's address to align the main-loop.
>> 
>> The address is parsed into its components by `VPointer`:
>> `VPointer[mem: 1648      LoadI, base:    1, adr: 1669,  base[   1] + offset(   0) + invar(   0) + scale(   4) * iv]`
>> 
>> We note that this is the access to native memory via Unsafe / MemorySegment, and so there is no array pointer base, and the `base = 1 TOP`. `VPointer` tries still to find a "base" adress `adr` by parsing the very left-most input to the chain of `AddP`s. Here, there is only a single `1711 AddP`, and the left input is `adr = 1669 CastX2P`. The right side of the `AddP` is also parsed, and determined to be `4 * iv`.
>> 
>> The problematic part: `1669 CastX2P` is "pinned" down below the pre-loop by the `1513CastLL` that is applied to `11 Parm` (= `long offset` parameter in the test).
>> 
>> ![image](https://github.com/openjdk/jdk/assets/32593061/d5579226-797c-489e-8aa1-0c906ca59755)
>> 
>> During `SuperWord`, we want to align the main-loop vectors. We do this by adjusting the pre-loop limit `1439 Opaque1`:
>> 
>> ![image](https://github.com/openjdk/jdk/assets/32593061/23bafa67-1438-4057-88a7-fb72e8b06c5c)
>> 
>> You can see the dark-green IR nodes, which compute the `new_limit = old_limit + adjustment`, where the adjustment is a modulo `1734 AndI` of the address of the `1738 LoadVector` for which we are aligning. In this computation we are also using the `adr` of our `VPointer`, which depends on the `1513 CastLL` which is pinned below the pre-loop. Thus, we are using a node inside the pre-loop that is pinned after the pre-loop. Hence the "bad dominance" assert.
>> 
>> **Why does this happen?**
>> 
>> Usually, the `base` and/or `adr` of a `VPointer` are invariant not just of the main-loop but als...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   test updates

That looks good to me.

> In a future RFE, I can then look into improving the VPointer parsing. The VPointer code is already very convoluted, and adding much more functionality will be difficult to manage. I want to completely refactor VPointer in a few months anyway. With improved parsing, this regression test could vectorize again.

Sounds good! Maybe you can link this bug to this RFE (if there is already one) to not forget about updating this test to check if it indeed vectorizes.

test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentMainLoopAlignment.java line 32:

> 30:  * @modules java.base/jdk.internal.misc
> 31:  * @modules java.base/jdk.internal.util
> 32:  * @library /test/lib /

Since you don't use anything from the test libs, I think you can remove this line.
Suggestion:

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18892#pullrequestreview-2017223726
PR Review Comment: https://git.openjdk.org/jdk/pull/18892#discussion_r1576235069