RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v36]

Christian Hagedorn chagedorn at openjdk.org
Wed Jul 9 10:37:52 UTC 2025


On Tue, 8 Jul 2025 08:43:31 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> To optimize a long counted loop and long range checks in a long or int
>> counted loop, the loop is turned into a loop nest. When the loop has
>> few iterations, the overhead of having an outer loop whose backedge is
>> never taken, has a measurable cost. Furthermore, creating the loop
>> nest usually causes one iteration of the loop to be peeled so
>> predicates can be set up. If the loop is short running, then it's an
>> extra iteration that's run with range checks (compared to an int
>> counted loop with int range checks).
>> 
>> This change doesn't create a loop nest when:
>> 
>> 1- it can be determined statically at loop nest creation time that the
>>    loop runs for a short enough number of iterations
>>   
>> 2- profiling reports that the loop runs for no more than ShortLoopIter
>>    iterations (1000 by default).
>>   
>> For 2-, a guard is added which is implemented as yet another predicate.
>> 
>> While this change is in principle simple, I ran into a few
>> implementation issues:
>> 
>> - while c2 has a way to compute the number of iterations of an int
>>   counted loop, it doesn't have that for long counted loop. The
>>   existing logic for int counted loops promotes values to long to
>>   avoid overflows. I reworked it so it now works for both long and int
>>   counted loops.
>> 
>> - I added a new deoptimization reason (Reason_short_running_loop) for
>>   the new predicate. Given the number of iterations is narrowed down
>>   by the predicate, the limit of the loop after transformation is a
>>   cast node that's control dependent on the short running loop
>>   predicate. Because once the counted loop is transformed, it is
>>   likely that range check predicates will be inserted and they will
>>   depend on the limit, the short running loop predicate has to be the
>>   one that's further away from the loop entry. Now it is also possible
>>   that the limit before transformation depends on a predicate
>>   (TestShortRunningLongCountedLoopPredicatesClone is an example), we
>>   can have: new predicates inserted after the transformation that
>>   depend on the casted limit that itself depend on old predicates
>>   added before the transformation. To solve this cicular dependency,
>>   parse and assert predicates are cloned between the old predicates
>>   and the loop head. The cloned short running loop parse predicate is
>>   the one that's used to insert the short running loop predicate.
>> 
>> - In the case of a long counted loop, the loop is transformed into a
>>   regular loop with a ...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 97 commits:
> 
>  - review
>  - Merge branch 'master' into JDK-8342692
>  - Update src/hotspot/share/opto/c2_globals.hpp
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - small fix
>  - Merge branch 'master' into JDK-8342692
>  - review
>  - review
>  - Update test/micro/org/openjdk/bench/java/lang/foreign/HeapMismatchManualLoopTest.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopScaleOverflow.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopPredicatesClone.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - ... and 87 more: https://git.openjdk.org/jdk/compare/310ef856...bb69cc02

I gave your latest patch another spin in our testing. It's still running but it already found some issues:

- SA tests (see separate comment)
- `#include` order problem (see separate comment)

- Various `jdk/incubator/vector/*` tests are failing, for example `Byte128VectorLoadStoreTests.java`:
Additional VM flags: `-XX:UseAVX=2` (it also reproduces with 0 and 1 so far)


#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/opt/mach5/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S650407/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/05605dc0-bf5e-434a-82b5-65af69c62ec6/runs/591d89b1-11c0-415e-b2ce-4c0a13ce80f8/workspace/open/src/hotspot/share/opto/vectorization.cpp:141), pid=704535, tid=704555
#  assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop

Current CompileTask:
C2:7789 1280             jdk.incubator.vector.ByteVector::ldLongOp (48 bytes)

Stack: [0x00007f9ef7cfe000,0x00007f9ef7dfe000],  sp=0x00007f9ef7df8560,  free space=1001k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1bcb7a4]  VLoop::check_preconditions_helper() [clone .part.0]+0x824  (vectorization.cpp:141)
V  [libjvm.so+0x1bcba31]  VLoop::check_preconditions()+0x41  (vectorization.cpp:41)
V  [libjvm.so+0x1573ea1]  PhaseIdealLoop::auto_vectorize(IdealLoopTree*, VSharedData&)+0x241  (loopopts.cpp:4449)
V  [libjvm.so+0x155274d]  PhaseIdealLoop::build_and_optimize()+0xfdd  (loopnode.cpp:5270)
[...]

src/hotspot/share/opto/castnode.cpp line 35:

> 33: #include "opto/subnode.hpp"
> 34: #include "opto/type.hpp"
> 35: #include "opto/loopnode.hpp"

The new unsorted include now causes `sources/TestIncludesAreSorted.java` to fail.

-------------

PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-3000973554
PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2194660384


More information about the hotspot-compiler-dev mailing list