RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4]

Fei Gao fgao at openjdk.org
Thu Jan 15 18:30:45 UTC 2026


On Wed, 14 Jan 2026 14:08:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits:
>> 
>>  - Fix build failure after rebasing and address review comments
>>  - Merge branch 'master' into optimize-atomic-post
>>  - Fixed new test failures after rebasing and refined parts of the code to address review comments
>>  - Merge branch 'master' into optimize-atomic-post
>>  - Merge branch 'master' into optimize-atomic-post
>>  - Clean up comments for consistency and add spacing for readability
>>  - Fix some corner case failures and refined part of code
>>  - Merge branch 'master' into optimize-atomic-post
>>  - Refine ascii art, rename some variables and resolve conflicts
>>  - Merge branch 'master' into optimize-atomic-post
>>  - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504
>
> src/hotspot/share/opto/loopTransform.cpp line 1337:
> 
>> 1335:   } else {
>> 1336:     if (get_ctrl(n) != back_ctrl) { return n; }
>> 1337:   }
> 
> Is the `clone_up_backedge` name still correct for all cases?
> It seems before we only cloned up nodes that belonged to `back_ctrl`.
> But now we also clone from exit pre-loop **exit path**.
> 
> I never liked the `goo` anyway...
> 
> And: why is it called `clone_up`? What "up" does it refer to?
> 
> What about `clone_up(FromBackedge, ...` and `clone_up(FromPreLoopExit, ...`, using two enums? Then we can be a bit more explicit which case we are in, and add corresponding asserts for `back_ctrl` (non-null vs null).

Yeah, I agree. We should rename it. How about `resolve_value_for_preheader`?

> What about clone_up(FromBackedge, ... and clone_up(FromPreLoopExit, ..., using two enums? Then we can be a bit more explicit which case we are in, and add corresponding asserts for back_ctrl (non-null vs null).

That sounds good. Something like:

resolve_value_for_preheader(FromBackedge, ...)
resolve_value_for_preheader(FromPreLoopExit, ...)

> src/hotspot/share/opto/loopTransform.cpp line 1402:
> 
>> 1400:   Node* main_backedge_ctrl = main_head->back_control();
>> 1401:   // For the post loop, we call clone_up_backedge_goo() to obtain the fall-out values
>> 1402:   // from the main loop, which serve as the fall-in values for the post loop.
> 
> The naming of `clone_up_backedge_goo` is confusing me a bit:
> We seem to clone down (main -> post) and we clone fall-out (exit) values, and not backedge values.

Agreed - that’s really confusing. I had considered renaming it before, but it would have required touching some unrelated files, so I decided against it. However, I now think the rename is necessary for clarity.

> I'm wondering if/how ControlAroundStripMined relates to the post loop case?

We use `ControlAroundStripMined` to clone the post loop. See the mainline code here: https://github.com/openjdk/jdk/blob/75420e9314c54adc5b45f9b274a87af54dd6b5a8/src/hotspot/share/opto/loopTransform.cpp#L1697

> if the branch is not taken, you assume it can only be the drain loop.

Yes.

> Could we assert mode == InsertVectorizedDrain down there?

Sounds good. I can add it.

> What about renaming pre_incr -> main_input or after_pre. That would remove the iv connotation, and be more parallel to drain_input or after_main.

Sounds reasonable.

> But a similar question here: what if we had a split-through-phi here?
Before split-through-phi, we would have had some input x, but afterwards we'd have op(x) here.
Would that not mean that we should use x for the input to drain_input, but are getting op(x)?

A similar argument applies here as well. We also use `clone_up_backedge_goo()` along this path to resolve the correct pre_incr value after the pre-loop. Even if a split-through-phi has occurred, the helper will walk and clone the input chain as needed to recover the appropriate value whose target control is the pre-exit control (`main_merge_region->in(1)`).

> src/hotspot/share/opto/loopTransform.cpp line 1482:
> 
>> 1480:       pre_incr = clone_up_backedge_goo(nullptr, main_merge_region->in(1), pre_incr, visited, clones);
>> 1481:     }
>> 1482:     drain_input->set_req(1, pre_incr);
> 
> Just a control question: above you did:
> `drain_input = PhiNode::make(main_merge_region, iv_after_main);`
> Does that not put `iv_after_main` at slot `1`, and now we overwrite it with `pre_incr`?

Yes, this is what we're generating:
`drain_input = merge (pre_incr, iv_after_main)`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2694628680
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2694641458
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2694662395
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2695142459
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2695189822


More information about the hotspot-compiler-dev mailing list