RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v24]

Christian Hagedorn chagedorn at openjdk.org
Wed May 21 10:26:06 UTC 2025


On Wed, 21 May 2025 08:00:56 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> To optimize a long counted loop and long range checks in a long or int
>> counted loop, the loop is turned into a loop nest. When the loop has
>> few iterations, the overhead of having an outer loop whose backedge is
>> never taken, has a measurable cost. Furthermore, creating the loop
>> nest usually causes one iteration of the loop to be peeled so
>> predicates can be set up. If the loop is short running, then it's an
>> extra iteration that's run with range checks (compared to an int
>> counted loop with int range checks).
>> 
>> This change doesn't create a loop nest when:
>> 
>> 1- it can be determined statically at loop nest creation time that the
>>    loop runs for a short enough number of iterations
>>   
>> 2- profiling reports that the loop runs for no more than ShortLoopIter
>>    iterations (1000 by default).
>>   
>> For 2-, a guard is added which is implemented as yet another predicate.
>> 
>> While this change is in principle simple, I ran into a few
>> implementation issues:
>> 
>> - while c2 has a way to compute the number of iterations of an int
>>   counted loop, it doesn't have that for long counted loop. The
>>   existing logic for int counted loops promotes values to long to
>>   avoid overflows. I reworked it so it now works for both long and int
>>   counted loops.
>> 
>> - I added a new deoptimization reason (Reason_short_running_loop) for
>>   the new predicate. Given the number of iterations is narrowed down
>>   by the predicate, the limit of the loop after transformation is a
>>   cast node that's control dependent on the short running loop
>>   predicate. Because once the counted loop is transformed, it is
>>   likely that range check predicates will be inserted and they will
>>   depend on the limit, the short running loop predicate has to be the
>>   one that's further away from the loop entry. Now it is also possible
>>   that the limit before transformation depends on a predicate
>>   (TestShortRunningLongCountedLoopPredicatesClone is an example), we
>>   can have: new predicates inserted after the transformation that
>>   depend on the casted limit that itself depend on old predicates
>>   added before the transformation. To solve this cicular dependency,
>>   parse and assert predicates are cloned between the old predicates
>>   and the loop head. The cloned short running loop parse predicate is
>>   the one that's used to insert the short running loop predicate.
>> 
>> - In the case of a long counted loop, the loop is transformed into a
>>   regular loop with a ...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 75 commits:
> 
>  - Merge branch 'master' into JDK-8342692
>  - more
>  - compilation fix
>  - new benchmark
>  - Add benchmark for manual mismatch loop
>  - review
>  - Merge branch 'master' into JDK-8342692
>  - Update src/hotspot/share/opto/loopnode.cpp
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update src/hotspot/share/opto/graphKit.cpp
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update src/hotspot/share/opto/castnode.cpp
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - ... and 65 more: https://git.openjdk.org/jdk/compare/50a7755f...6100f757

src/hotspot/share/opto/castnode.cpp line 35:

> 33: #include "opto/type.hpp"
> 34: #include "castnode.hpp"
> 35: #include "loopnode.hpp"

You should add a `opto` prefix. `castnode.hpp` can be removed since it's already added above.

src/hotspot/share/opto/castnode.cpp line 328:

> 326: }
> 327: 
> 328: bool CastLLNode::inner_loop_backedge(Node* proj) {

`inner_loop_backedge()` and `cmp_used_at_inner_loop_exit_test()` can be made `static` and `used_at_inner_loop_exit_test()` `const`.

src/hotspot/share/opto/loopnode.cpp line 1140:

> 1138: };
> 1139: 
> 1140: class CloneShortLoopPredicatesVisitor : public PredicateVisitor {

Maybe add a quick comment here that we clone and insert at the same loop.

src/hotspot/share/opto/loopnode.cpp line 1148:

> 1146:                                   const NodeInShortLoopBody& node_in_loop_body,
> 1147:                                   PhaseIdealLoop* phase)
> 1148:     : _clone_predicate_to_loop(loop_head, node_in_loop_body, phase),

To be more explicit:
Suggestion:

  CloneShortLoopPredicatesVisitor(LoopNode* target_loop_head,
                                  const NodeInShortLoopBody& node_in_loop_body,
                                  PhaseIdealLoop* phase)
    : _clone_predicate_to_loop(target_loop_head, node_in_loop_body, phase),

src/hotspot/share/opto/loopnode.cpp line 1156:

> 1154: 
> 1155:   void visit(const ParsePredicate& parse_predicate) override {
> 1156:     _clone_predicate_to_loop.clone_parse_predicate(parse_predicate, true);

`clone_parse_predicate()` has its second parameter named `is_false_path_loop`. I think that no longer makes sense because we are now reusing the method outside Loop Unswitching.. Maybe we should just rename the parameter to `rewire_uncommon_proj_phi_inputs` which is eventually the name in `create_new_if_for_predicate()`. Additionally, we should rename `ParsePredicate::clone_to_unswitched_loop()` to `clone_to_loop()`. What do you think?

src/hotspot/share/opto/loopnode.cpp line 1159:

> 1157:     parse_predicate.kill(_phase->igvn());
> 1158:   }
> 1159:   void visit(const TemplateAssertionPredicate& template_assertion_predicate) override {

Suggestion:


  void visit(const TemplateAssertionPredicate& template_assertion_predicate) override {

src/hotspot/share/opto/loopnode.cpp line 1164:

> 1162:   }
> 1163: };
> 1164: // If the loop is either statically known to run for a small enough number of iterations or if profile data indicates

Suggestion:


// If the loop is either statically known to run for a small enough number of iterations or if profile data indicates

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099744340
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099742420
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099794431
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099795386
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099780513
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099777107
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099760430


More information about the hotspot-compiler-dev mailing list