RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4]
Fei Gao
fgao at openjdk.org
Tue Jan 20 11:24:52 UTC 2026
On Fri, 16 Jan 2026 09:33:10 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits:
>>
>> - Fix build failure after rebasing and address review comments
>> - Merge branch 'master' into optimize-atomic-post
>> - Fixed new test failures after rebasing and refined parts of the code to address review comments
>> - Merge branch 'master' into optimize-atomic-post
>> - Merge branch 'master' into optimize-atomic-post
>> - Clean up comments for consistency and add spacing for readability
>> - Fix some corner case failures and refined part of code
>> - Merge branch 'master' into optimize-atomic-post
>> - Refine ascii art, rename some variables and resolve conflicts
>> - Merge branch 'master' into optimize-atomic-post
>> - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504
>
> src/hotspot/share/opto/loopTransform.cpp line 1841:
>
>> 1839: }
>> 1840: }
>> 1841:
>
> A few questions:
> - Why not cache the `skip_assertion_predicates_with_halt` values? Do the values change over time? If there are lots of predicates, you will do a traversal over and over again.
> - Why do we need this special logic for the `drain` loop cloning? What makes it different to other cloning cases?
> - The new ctrl you set is either at the post-head entry, or after skipping the predicates. Why did you chose those?
Thanks for the questions. Great points.
> Why not cache the `skip_assertion_predicates_with_halt` values? Do the values change over time? If there are lots of predicates, you will do a traversal over and over again.
You’re right. There’s no strong reason not to cache the result of `skip_assertion_predicates_with_halt() `here. We can do that to avoid repeated traversals when there are many predicates. I’ll refine the code accordingly.
> Why do we need this special logic for the drain loop cloning? What makes it different to other cloning cases?
Drain-loop cloning is significantly more complex than other cloning cases.
When cloning a `post loop`, the loop structure is relatively simple and does not involve a `pre-loop` or a `minimum-trip guard`. The control-flow shape typically looks like:
...
/ \
... multiple predicates
/ \
EntryControl
/ \
main loop head
In this case, some nodes in the main loop may take any control node between the predicate `IfTrue` nodes and the loop `EntryControl` as their control input, meaning they can float along that control chain. When such nodes are cloned, their control inputs need to be fixed to the corresponding nodes in the cloned loop.
This is already handled by `initialize_assertion_predicates_for_post_loop()`, which correctly fixes control inputs for `TemplateAssertionPredicates`. See https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/src/hotspot/share/opto/predicates.cpp#L157
However, when inserting a `drain loop`, the control-flow structure is more involved:
main zero-trip guard
/ \
IfFalse IfTrue
/ \
... multiple predicates
/ \
EntryControl
/ \
main loop head
Here, the main loop is preceded by both a `pre-loop` and a `minimum-trip guard`. As a result, control inputs of nodes inside the main loop may originate anywhere from the `IfTrue` of the `minimum-trip guard` down to the loop `EntryControl`. These cases are outside the scope of `initialize_assertion_predicates_for_post_loop()`, which only handles `TemplateAssertionPredicates`.
In practice, I’ve also observed nodes attached to `InitializedAssertionPredicates`, which are not covered by that helper either.
This is why we need separate, more general logic when cloning drain loops.
> The new ctrl you set is either at the post-head entry, or after skipping the predicates. Why did you chose those?
Ideally, each cloned node should be reattached to the exact corresponding control node in the new loop. For example:
1. If the original node was controlled by the `IfTrue` of the `minimum-trip guard`, the clone should be attached to the corresponding `IfTrue` in the cloned control flow.
2. If it was controlled by a predicate’s taken path, the clone should ideally attach to the corresponding predicate in the cloned loop.
The first case is relatively straightforward and is handled by skipping predicates and assigning the appropriate control node.
The second case is harder: finding the precise corresponding “taken” node for every predicate instance is non-trivial. For those nodes, I conservatively sink them to the bottom of the control chain by attaching them to the `post-loop` entry control. This guarantees correctness, even if it sacrifices some control precision.
That was the rationale behind this logic.
Happy to hear your thoughts on whether this approach makes sense or if there’s a better alternative :)
> src/hotspot/share/opto/loopnode.hpp line 1507:
>
>> 1505: // If 'back_ctrl' is null: (Specially for pre-loop exit in resolve_input_for_drain_or_post())
>> 1506: // - Clone 'n' into 'preheader_ctrl' if its block does not strictly dominate 'preheader_ctrl'.
>> 1507: // - Otherwise, return 'n'.
>
> Personally, I think it would be better to avoid extensive documentation in both `hpp` and `cpp`. The chances that the documentation eventually goes out of sync is big.
>
> I'd suggest putting the documentation in the `cpp`, and either no documentation or only a summary in `hpp`.
Agreed.
> src/hotspot/share/opto/loopopts.cpp line 2384:
>
>> 2382: //
>> 2383: // Let us look at the data path of the trip counter, as an example
>> 2384: // to understand the data uses:
>
> I love the ASCII art :)
>
> But I'm wondering if it might be better not to use `iv` names like `pre_incr`, just because the `iv` phi structure is simpler than others, and this hides the complexity of other phis.
>
> But I don't have the solution yet.
How about renaming these variables to align with the suggestion in https://github.com/openjdk/jdk/pull/22629/files#r2694752041 :
pre_incr -> pre_exit_value
main_old -> main_exit_value
drain_incr -> drain_exit_value
> What nodes might already be on the `worklist` before we get here?
The `worklist` should be empty before we reach this point. It’s reused across` fix_ctrl_uses()`, `fix_data_uses()`, and similar helpers, but each of them expects and maintains an empty `worklist` on entry.
> Why not rename `main_old` to `main_incr` or `main_backedge`, or something else that is a bit more specific?
Yes, we can rename `main_old` to `main_exit_value`, as suggested above.
> src/hotspot/share/opto/loopopts.cpp line 2485:
>
>> 2483: }
>> 2484:
>> 2485: Node_List visit_list;
>
> Suggestion:
>
> ResourceMark rm;
> Node_List visit_list;
>
> Can we do this, or do we run into issues?
No, we can’t. I think we discussed this before. Since `old_new` can grow within this scope, we can’t use `ResourceMark` here.
> src/hotspot/share/opto/loopopts.cpp line 2489:
>
>> 2487:
>> 2488: while (worklist.size() != 0) {
>> 2489: Node* use = worklist.pop();
>
> Can you add a comment above these lines and say what we are iterating over and what we do with each `use`? Only a single line, so the reader knows what to expect :)
Sure. I'll add something like:
// Walk all non-loop-local uses of "main_old" to clone exit-path data-flow and rebuild the
// corresponding `drain_merge_phi`s.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2707896245
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2705289890
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2705362961
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2705512493
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2705370501
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2705403982
More information about the hotspot-compiler-dev
mailing list