RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v22]
Emanuel Peter
epeter at openjdk.org
Mon May 19 09:13:59 UTC 2025
On Thu, 15 May 2025 14:59:18 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> To optimize a long counted loop and long range checks in a long or int
>> counted loop, the loop is turned into a loop nest. When the loop has
>> few iterations, the overhead of having an outer loop whose backedge is
>> never taken, has a measurable cost. Furthermore, creating the loop
>> nest usually causes one iteration of the loop to be peeled so
>> predicates can be set up. If the loop is short running, then it's an
>> extra iteration that's run with range checks (compared to an int
>> counted loop with int range checks).
>>
>> This change doesn't create a loop nest when:
>>
>> 1- it can be determined statically at loop nest creation time that the
>> loop runs for a short enough number of iterations
>>
>> 2- profiling reports that the loop runs for no more than ShortLoopIter
>> iterations (1000 by default).
>>
>> For 2-, a guard is added which is implemented as yet another predicate.
>>
>> While this change is in principle simple, I ran into a few
>> implementation issues:
>>
>> - while c2 has a way to compute the number of iterations of an int
>> counted loop, it doesn't have that for long counted loop. The
>> existing logic for int counted loops promotes values to long to
>> avoid overflows. I reworked it so it now works for both long and int
>> counted loops.
>>
>> - I added a new deoptimization reason (Reason_short_running_loop) for
>> the new predicate. Given the number of iterations is narrowed down
>> by the predicate, the limit of the loop after transformation is a
>> cast node that's control dependent on the short running loop
>> predicate. Because once the counted loop is transformed, it is
>> likely that range check predicates will be inserted and they will
>> depend on the limit, the short running loop predicate has to be the
>> one that's further away from the loop entry. Now it is also possible
>> that the limit before transformation depends on a predicate
>> (TestShortRunningLongCountedLoopPredicatesClone is an example), we
>> can have: new predicates inserted after the transformation that
>> depend on the casted limit that itself depend on old predicates
>> added before the transformation. To solve this cicular dependency,
>> parse and assert predicates are cloned between the old predicates
>> and the loop head. The cloned short running loop parse predicate is
>> the one that's used to insert the short running loop predicate.
>>
>> - In the case of a long counted loop, the loop is transformed into a
>> regular loop with a ...
>
> Roland Westrelin has updated the pull request incrementally with 11 additional commits since the last revision:
>
> - Update src/hotspot/share/opto/loopnode.cpp
>
> Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
> - Update src/hotspot/share/opto/graphKit.cpp
>
> Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
> - Update src/hotspot/share/opto/castnode.cpp
>
> Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
> - Update src/hotspot/share/opto/loopnode.hpp
>
> Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
> - Update src/hotspot/share/opto/loopnode.hpp
>
> Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
> - Update src/hotspot/share/opto/loopnode.cpp
>
> Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
> - Update src/hotspot/share/opto/loopnode.cpp
>
> Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
> - Update src/hotspot/share/opto/loopnode.cpp
>
> Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
> - Update src/hotspot/share/opto/loopnode.cpp
>
> Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
> - Update src/hotspot/share/opto/loopnode.cpp
>
> Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
> - ... and 1 more: https://git.openjdk.org/jdk/compare/b0129598...2164c15f
I also looked back at the results, I think it was this:
https://github.com/openjdk/jdk/pull/21630#issuecomment-2587016221
Nice to see that your patch is faster for small iterations. But what I am missing: where do the lines cross? I.e. at what iteration count does the loop-nest become profitable?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2890249311
More information about the hotspot-compiler-dev
mailing list