RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v5]

Tue Aug 12 17:20:17 UTC 2025

On Tue, 12 Aug 2025 17:08:50 GMT, Manuel Hässig <mhaessig at openjdk.org> wrote:

>> A loop of the form
>> 
>> MemorySegment ms = {};
>> for (long i = 0; i < ms.byteSize() / 8L; i++) {
>>     // vectorizable work
>> }
>> 
>> does not vectorize, whereas
>> 
>> MemorySegment ms = {};
>> long size = ms.byteSize();
>> for (long i = 0; i < size / 8L; i++) {
>>     // vectorizable work
>> }
>> 
>> vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*:
>> 
>> https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176
>> 
>> Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization.
>> 
>> So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization.
>> 
>> @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem
>> 
>> ## Change Description
>> 
>> Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge.
>> 
>> <details><summary>Explored Alternatives</summary>
>> 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops.
>> 2. I...
>
> Manuel Hässig has updated the pull request incrementally with eight additional commits since the last revision:
> 
>  - Better documentation of profitable()
>  - Remove vector sizes
>  - Specify vector sizes
>  - Merge branch 'jdk-8356176-byte-size' of github.com:mhaessig/jdk into jdk-8356176-byte-size
>  - Add asserts
>  - Make region a field
>  - Even more better debug print
>  - Remove redundant scenarios

Ok, things are improving nicely 😊

src/hotspot/share/opto/loopnode.hpp line 1678:

> 1676:       // profitable if the win of a split is not on the entry edge, as such wins
> 1677:       // only pay off once and have a high chance of messing up the loop structure.
> 1678:       return (_loop_entry_wins == 0 && _total_wins > policy) ||

It may be good if you also mention that this applies not just to Loops with no entry wins, but also to non-loop Regions.
Actually, I would suggest that you move the comment from above down to this section.

    // In general this means that the split has to have more wins than specified
    // in the policy. However, for loops we need to take into account where the
    // wins happen.

src/hotspot/share/opto/loopnode.hpp line 1686:

> 1684:       // split. This is needed when we split a node and then must also split a
> 1685:       // dependant node, i.e. spliting a Bool node after splitting a Cmp node.
> 1686:              policy < 0;

It seems to me that `policy < 0` actually implies `_loop_back_wins > policy`, because `policy < 0 <= _loop_back_wins`.

Maybe it is still better to keep it, so we are explicit.

-------------

PR Review: https://git.openjdk.org/jdk/pull/26429#pullrequestreview-3111933820
PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270602622
PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2270599371