RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v20]
Christian Hagedorn
chagedorn at openjdk.org
Thu May 15 13:04:04 UTC 2025
On Fri, 9 May 2025 12:41:14 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> To optimize a long counted loop and long range checks in a long or int
>> counted loop, the loop is turned into a loop nest. When the loop has
>> few iterations, the overhead of having an outer loop whose backedge is
>> never taken, has a measurable cost. Furthermore, creating the loop
>> nest usually causes one iteration of the loop to be peeled so
>> predicates can be set up. If the loop is short running, then it's an
>> extra iteration that's run with range checks (compared to an int
>> counted loop with int range checks).
>>
>> This change doesn't create a loop nest when:
>>
>> 1- it can be determined statically at loop nest creation time that the
>> loop runs for a short enough number of iterations
>>
>> 2- profiling reports that the loop runs for no more than ShortLoopIter
>> iterations (1000 by default).
>>
>> For 2-, a guard is added which is implemented as yet another predicate.
>>
>> While this change is in principle simple, I ran into a few
>> implementation issues:
>>
>> - while c2 has a way to compute the number of iterations of an int
>> counted loop, it doesn't have that for long counted loop. The
>> existing logic for int counted loops promotes values to long to
>> avoid overflows. I reworked it so it now works for both long and int
>> counted loops.
>>
>> - I added a new deoptimization reason (Reason_short_running_loop) for
>> the new predicate. Given the number of iterations is narrowed down
>> by the predicate, the limit of the loop after transformation is a
>> cast node that's control dependent on the short running loop
>> predicate. Because once the counted loop is transformed, it is
>> likely that range check predicates will be inserted and they will
>> depend on the limit, the short running loop predicate has to be the
>> one that's further away from the loop entry. Now it is also possible
>> that the limit before transformation depends on a predicate
>> (TestShortRunningLongCountedLoopPredicatesClone is an example), we
>> can have: new predicates inserted after the transformation that
>> depend on the casted limit that itself depend on old predicates
>> added before the transformation. To solve this cicular dependency,
>> parse and assert predicates are cloned between the old predicates
>> and the loop head. The cloned short running loop parse predicate is
>> the one that's used to insert the short running loop predicate.
>>
>> - In the case of a long counted loop, the loop is transformed into a
>> regular loop with a ...
>
> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision:
>
> - Emanuel's review
> - Christian's review
Sorry for the delay, here are some more mostly minor comments. Will make another pass again afterwards.
src/hotspot/share/opto/c2_globals.hpp line 838:
> 836: \
> 837: product(bool, ShortRunningLongLoop, true, DIAGNOSTIC, \
> 838: "long counted loop/long range checks: don't create loop nest if" \
Suggestion:
"long counted loop/long range checks: don't create loop nest if " \
src/hotspot/share/opto/c2_globals.hpp line 839:
> 837: product(bool, ShortRunningLongLoop, true, DIAGNOSTIC, \
> 838: "long counted loop/long range checks: don't create loop nest if" \
> 839: "loop runs for small enough number of iterations") \
Suggestion:
"loop runs for small enough number of iterations.") \
src/hotspot/share/opto/c2_globals.hpp line 842:
> 840: \
> 841: develop(bool, StressShortRunningLongLoop, false, \
> 842: "Speculate all long counted loops are short running when bounds" \
Suggestion:
"Speculate all long counted loops are short running when bounds " \
src/hotspot/share/opto/c2_globals.hpp line 843:
> 841: develop(bool, StressShortRunningLongLoop, false, \
> 842: "Speculate all long counted loops are short running when bounds" \
> 843: "are unknown even if profile data doesn't say so") \
Suggestion:
"are unknown even if profile data doesn't say so.") \
src/hotspot/share/opto/castnode.cpp line 416:
> 414: }
> 415: }
> 416: // if it's a cast created by PhaseIdealLoop::create_loop_nest(), don't transform it until the counted loop is created
Maybe be more specific here?
Suggestion:
// If it's a cast created by PhaseIdealLoop::short_running_loop(), don't transform it until the counted loop is created
src/hotspot/share/opto/graphKit.cpp line 4055:
> 4053: if (ShortRunningLongLoop) {
> 4054: // Will narrow the limit down with a cast node. Predicates added later may depend on the cast so should be last when
> 4055: // starting from the loop.
Suggestion:
// walking up from the loop.
src/hotspot/share/opto/loopnode.cpp line 1128:
> 1126: class NodeInShortLoopBody : public NodeInLoopBody {
> 1127: PhaseIdealLoop* _phase;
> 1128: IdealLoopTree* _ilt;
The pointers can be made const:
Suggestion:
PhaseIdealLoop* const _phase;
IdealLoopTree* const _ilt;
src/hotspot/share/opto/loopnode.cpp line 1172:
> 1170: // Only process if we are in the correct Predicate Block.
> 1171: return;
> 1172: }
Do we really need this check? Could we not just clone all Template Assertion Predicates that we find? I think with the recent Assertion Predicate changes, we are sure that all Template Assertion Predicates found belong to this loop. Otherwise, they would already be marked useless and `visit()` is not called on them.
src/hotspot/share/opto/loopnode.cpp line 1180:
> 1178:
> 1179: // If bounds are known, or profile data indicates it runs for a small enough number of iterations, so the loop doesn't
> 1180: // need an outer loop, don't create the outer loop
I suggest to add more details here from the first paragraph of the PR description where you describe the motivation. Maybe something like:
// If the loop is either statically known to run for a small enough number of iterations or if profile data indicates
// that, we don't want an outer loop for the following reasons:
// <info from first paragraph of the PR description>
//
// In the short running case, turn the loop into a regular loop again and transform the long range checks:
// - LongCountedLoop: Create LoopNode but keep the loop limit type with a CastLL node to avoid that we later try to
// create a Loop Limit Check when turning the LoopNode into a CountedLoopNode.
// - CountedLoop: Can be reused.
src/hotspot/share/opto/loopnode.cpp line 1191:
> 1189: loop->compute_trip_count(this, bt);
> 1190: // Loop must run for no more than iter_limits as it guarantees no overflow of scale * iv in long range checks.
> 1191: bool known_short_running_loop = head->trip_count() <= iters_limit / ABS(stride_con);
Can you also add a comment about the decision of the hardcoded `iters_limit / ABS(stride_con)` limit to indicate a short running long loop?
src/hotspot/share/opto/loopnode.cpp line 1198:
> 1196: profile_short_running_loop = true;
> 1197: } else {
> 1198: profile_short_running_loop = !head->is_profile_trip_failed() && head->profile_trip_cnt() < iters_limit / ABS(stride_con);
Why do we compare with `<=` above but here with `<`?
src/hotspot/share/opto/loopnode.cpp line 1225:
> 1223: // Predicate). The current limit could, itself, be dependent on an existing predicate. Clone parse and template
> 1224: // assertion predicates below existing predicates to get proper ordering of predicates when walking from the loop
> 1225: // up: future predicates, Short Running Long Loop Predicate, existing predicates.
Maybe you missed the visualization I've added in a comment for an earlier commit. I would find it quite useful to quickly grasp the idea, what do you think?
//
// Existing Hoisted
// Check Predicates
// |
// New Short Running Long
// Loop Predicate
// |
// Cloned Parse Predicates and
// Template Assertion Predicates
// |
// Loop
src/hotspot/share/opto/loopnode.cpp line 1263:
> 1261: #ifndef PRODUCT
> 1262: // report that the loop predication has been actually performed
> 1263: // for this loop
The trace message below already suggests that the predicate was added. So you might want to just remove this comment.
Suggestion:
src/hotspot/share/opto/loopnode.cpp line 1265:
> 1263: // for this loop
> 1264: if (TraceLoopLimitCheck) {
> 1265: tty->print_cr("Short Loop Check generated:");
Suggestion:
tty->print_cr("Short Long Loop Check Predicate generated:");
src/hotspot/share/opto/loopnode.cpp line 1269:
> 1267: }
> 1268: #endif
> 1269: entry_control = head->skip_strip_mined()->in(LoopNode::EntryControl);
It looks like this line rather belongs to the `Predicate` on L1275? Might have been moved here by accident.
src/hotspot/share/opto/loopnode.cpp line 1273:
> 1271: // We're turning a long counted loop into a regular loop that will be converted into an int count loop. That loop
> 1272: // won't need loop limit checks (iters_limit guarantees that). Add a cast to make sure that, whatever transformation
> 1273: // happens by the time the counted loop is created, c2 knows enough about the loop's limit that it doesn't try to
Suggestion:
// happens by the time the counted loop is created, C2 knows enough about the loop's limit that it doesn't try to
src/hotspot/share/opto/loopnode.cpp line 1290:
> 1288:
> 1289: Node* int_zero = _igvn.intcon(0);
> 1290: set_ctrl(int_zero, C->root());
I think you can just use `PhaseIdealLoop::intcon()` which takes care of `set_ctrl()`:
Suggestion:
Node* int_zero = intcon(0);
src/hotspot/share/opto/loopnode.cpp line 1298:
> 1296: // Clone the iv data nodes as an integer iv
> 1297: Node* int_stride = _igvn.intcon(stride_con);
> 1298: set_ctrl(int_stride, C->root());
Same here:
Suggestion:
Node* int_stride = intcon(stride_con);
src/hotspot/share/opto/loopnode.cpp line 1299:
> 1297: Node* int_stride = _igvn.intcon(stride_con);
> 1298: set_ctrl(int_stride, C->root());
> 1299: Node* inner_phi = new PhiNode(head, new_phi_t);
You only seem to define `new_phi_t` as `TypeInt::INT` and then use it here once. Maybe just inline it?
Suggestion:
Node* inner_phi = new PhiNode(head, TypeInt::INT);
src/hotspot/share/opto/loopnode.cpp line 1302:
> 1300: Node* inner_incr = new AddINode(inner_phi, int_stride);
> 1301: Node* inner_cmp = nullptr;
> 1302: inner_cmp = new CmpINode(inner_incr, new_limit);
Can be merged:
Suggestion:
Node* inner_cmp = new CmpINode(inner_incr, new_limit);
src/hotspot/share/opto/loopnode.hpp line 310:
> 308:
> 309: void set_trip_count(julong tc) { _trip_count = checked_cast<uint>(tc); }
> 310: julong trip_count() { return _trip_count; }
Can be made const:
Suggestion:
julong trip_count() const { return _trip_count; }
src/hotspot/share/opto/loopnode.hpp line 398:
> 396:
> 397: void set_trip_count(julong tc) { _trip_count = tc; }
> 398: julong trip_count() { return _trip_count; }
Can be made const:
Suggestion:
julong trip_count() const { return _trip_count; }
src/hotspot/share/opto/predicates.cpp line 945:
> 943: tty->print_cr("- Loop Predicate Block:");
> 944: _loop_predicate_block.dump(" ");
> 945: tty->print_cr("- Short Running Loop Predicate Block:");
Suggestion:
tty->print_cr("- Short Running Long Loop Predicate Block:");
src/hotspot/share/opto/predicates.hpp line 77:
> 75: * be added once above the Loop Limit Check Parse Predicate for a loop.
> 76: * - Short Short: This predicate is created when a long counted loop is transformed into an int counted
> 77: * Running Long loop. In the general, that transformation requires an outer loop to guarantee that the new
Suggestion:
* Running Long loop. In general, that transformation requires an outer loop to guarantee that the new
-------------
PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-2842765753
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090598643
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090599557
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090598904
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090599820
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091094942
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091096027
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091097871
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091107235
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090658362
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091119720
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091051479
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090861487
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091056745
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091054533
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091078881
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091058908
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091091033
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091091490
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091067560
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091060848
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091091968
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091092991
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090855070
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090854689
More information about the hotspot-compiler-dev
mailing list