RFR: 8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression [v3]
Emanuel Peter
epeter at openjdk.org
Tue Aug 12 13:27:17 UTC 2025
On Tue, 12 Aug 2025 09:34:37 GMT, Manuel Hässig <mhaessig at openjdk.org> wrote:
>> A loop of the form
>>
>> MemorySegment ms = {};
>> for (long i = 0; i < ms.byteSize() / 8L; i++) {
>> // vectorizable work
>> }
>>
>> does not vectorize, whereas
>>
>> MemorySegment ms = {};
>> long size = ms.byteSize();
>> for (long i = 0; i < size / 8L; i++) {
>> // vectorizable work
>> }
>>
>> vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the `LShift` used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting `LShift`s modifying the IV of a *counted loop*:
>>
>> https://github.com/openjdk/jdk/blob/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd/src/hotspot/share/opto/loopopts.cpp#L1172-L1176
>>
>> Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization.
>>
>> So, why is the counted loop not detected? Because the call to `byteSize()` is inside the loop head, and `CiTypeFlow::clone_loop_heads()` duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first `ITER_GVN` in `PHASEIDEALLOOP1` will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization.
>>
>> @merykitty also provides an alternative explanation. A node is only split through a phi if that splitting is profitable. While the split looks to be profitable in the example above, it only generates wins on the loop entry edge. This ends up destroying the canonical loop structure and prevents further optimization. Other issues like [JDK-8348096](https://bugs.openjdk.org/browse/JDK-8348096) suffer from the same problem
>>
>> ## Change Description
>>
>> Based on @merykitty's reasoning, this PR tracks if wins in `split_through_phi()` are on the loop entry edge or the loop backedge. If there are wins on a loop entry edge, we do not consider the split to be profitable unless there are a lot of wins on the backedge.
>>
>> <details><summary>Explored Alternatives</summary>
>> 1. Prevent splitting `LShift`s in uncounted loops that have the same shape as a counted loop would have. This fixes this specific issue, but causes potential regressions with uncounted loops.
>> 2. I...
>
> Manuel Hässig has updated the pull request incrementally with two additional commits since the last revision:
>
> - Improve comment
> - Fix build failure on product
src/hotspot/share/opto/loopnode.hpp line 1643:
> 1641: int _loop_entry_wins;
> 1642: // Number of wins on a loop back-edge, which pay dividends on every iteration.
> 1643: int _loop_back_wins;
Suggestion:
// Sum of all wins regardless of where they happen. This applies to Loops phis as well as non-loop phis.
int _total_wins;
// For Loops, wins have different impact depending on if they happen on loop entry or on the backedge.
// Number of wins on a loop entry edge if the split is through a loop head,
// otherwise 0. Entry edge wins only pay dividends once on loop entry.
int _loop_entry_wins;
// Number of wins on a loop back-edge, which pay dividends on every iteration.
int _loop_back_wins;
src/hotspot/share/opto/loopnode.hpp line 1659:
> 1657: }
> 1658: _total_wins++;
> 1659: }
Why not make the `region` a field? That way, you don't have to pass it every time.
And: if the region is a Loop, then we should have `_total_wins = _loop_entry_wins + _loop_back_wins`, correct? And if it is not a Loop, then the loop fields should be zero. You could add such an assert in `profitable` :)
src/hotspot/share/opto/loopnode.hpp line 1673:
> 1671: // dependant node, i.e. spliting a Bool node after splitting a Cmp node.
> 1672: bool profitable(int policy) const {
> 1673: return policy < 0 || (_loop_entry_wins == 0 && _total_wins > policy) || _loop_back_wins > policy;
This already looks better. I'm wondering if we can still improve the readability a bit. Why not group the descriptions with the corresponding conditions?
// In general this means that the split has to have more wins than specified
// in the policy. In loops, we need to be careful when splitting, because it
// can sufficiently rearrange the loop structure to prevent RCE and thus
// vectorization. Thus, we only deem splitting profitable if the win of a
// split is not on the entry edge, as such wins only pay off once and have
// a high chance of messing up the loop structure.
This seems to go with condition
`(_loop_entry_wins == 0 && _total_wins > policy)`
// in the policy. In loops, we need to be careful when splitting, because it
// can sufficiently rearrange the loop structure to prevent RCE and thus
// vectorization.
Maybe we can restate the argument a little, and make it more explicit what kinds of rearrangements are problematic here? Maybe an example could help. What is it exactly that we risk loosing here?
// vectorization. Thus, we only deem splitting profitable if the win of a
// split is not on the entry edge, as such wins only pay off once and have
// a high chance of messing up the loop structure.
How is this impacted if we have wins on both the entry and the backedge? Is that possible? Do we have any benchmarks here?
It would be nice if we could say that if we **only** had entry wins and no backedge wins, then it's ok to not split, because this win would only have happened once per loop execution. But as soon as we have wins on the backedge, it would be profitable to split, and we should do so, right?
// a high chance of messing up the loop structure. However, if there are
// wins on the entry edge and also sufficient wins on the backadge, which
// pay off on every iteration, a split is also deemed profiable.
Ah, you are arguing about that here actually. Must be about this condition:
`_loop_back_wins > policy`
You see, I'm struggling a bit to follow here and have to reconstruct it ;)
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269782994
PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269836104
PR Review Comment: https://git.openjdk.org/jdk/pull/26429#discussion_r2269830904
More information about the hotspot-compiler-dev
mailing list