RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression

Wed Jul 30 12:05:36 UTC 2025

A loop of the form

MemorySegment ms = {};
for (long i = 0; i < ms.byteSize() / 8L; i++) {
    // vectorizable work
}

does not vectorize, whereas

MemorySegment ms = {};
long size = ms.byteSize();
for (long i = 0; i < size / 8L; i++) {
    // vectorizable work
}

vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*:

https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176

Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization.

So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization.

@merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem

## Change Description

Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge.

<details><summary>Explored Alternatives</summary>
1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops.
2. Insert a "`PHASEIDEALLOOP0`" with `LoopOptsNone` that only performs loop tree building and then a round of IGVN where `Loop` nodes have been created. This cleans up the duplicated loop limit field access inside the loop, which enables the counted loop detection in `PHASEIDEALLOOP1`. This fixes this issue and a few others, but has loads of unforeseen consequences for loopopts down the line, including some regressions.
</details>

This solution also has an impact on some tests:
 - `compiler/loopopts/InvariantCodeMotionReassociateAddSub.java` observes fewer `AddI` nodes ([d9a59af](https://github.com/openjdk/jdk/pull/26429/commits/d9a59af977da70575a1e215c504958b1fb3db6a6))
 - `compiler/vectorization/runner/ArrayIndexFillTest.java` only remains with the `fillLongArray` case attributed to [JDK-8332878](https://bugs.openjdk.org/browse/JDK-8332878) and the previously failing floating point cases fixed ([5839f15](https://github.com/openjdk/jdk/pull/26429/commits/5839f157cae57f80fb041251a0a28327a0970fae))
 - `compiler/loopopts/superword/TestMemorySegment.java` shows that the failing test cases tracked by [JDK-8331659](https://bugs.openjdk.org/browse/JDK-8331659) pass now ([63689f8](https://github.com/openjdk/jdk/pull/26429/commits/63689f84b364828f7b50979acf1443498dddd1da))
 - the reproducer from [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) is fixed with this PR. Added `TestMemorySegmentField.java` as regression test.

## Testing

 - [x] Github Actions
 - [x] tier1 - tier3 plus some internal testing on Oracle supported platforms
 - [x] tier4 - tier6 on Oracle supported platforms
 - [ ] SPECjbb2025, SPECjvm2008, Dacapo23

## Acknowledgements

Big thanks to @merykitty for coming up with the solution to this issue and providing feedback, as well as @eme64, @chhagedorn, @TobiHartmann, and @rwestrel for discussing this issue and providing feedback.

-------------

Commit messages:
 - Address review comments
 - Add test from JDK-8348096
 - Fix cases of JDK-8332878 not caused by push through add
 - Adjust previously failing tests tracked by JDK-8331659
 - Adjust for eliminated nodes in InvariantCodeMotionReassociateAddSub.java
 - Split only profitable when not on entry edge
 - Add regression test

Changes: https://git.openjdk.org/jdk/pull/26429/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26429&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8356176
  Stats: 228 lines in 7 files changed: 187 ins; 18 del; 23 mod
  Patch: https://git.openjdk.org/jdk/pull/26429.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26429/head:pull/26429

PR: https://git.openjdk.org/jdk/pull/26429