RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v9]

Tobias Hartmann thartmann at openjdk.org
Wed Feb 5 06:08:20 UTC 2025


On Tue, 4 Feb 2025 10:11:36 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> To optimize a long counted loop and long range checks in a long or int
>> counted loop, the loop is turned into a loop nest. When the loop has
>> few iterations, the overhead of having an outer loop whose backedge is
>> never taken, has a measurable cost. Furthermore, creating the loop
>> nest usually causes one iteration of the loop to be peeled so
>> predicates can be set up. If the loop is short running, then it's an
>> extra iteration that's run with range checks (compared to an int
>> counted loop with int range checks).
>> 
>> This change doesn't create a loop nest when:
>> 
>> 1- it can be determined statically at loop nest creation time that the
>>    loop runs for a short enough number of iterations
>>   
>> 2- profiling reports that the loop runs for no more than ShortLoopIter
>>    iterations (1000 by default).
>>   
>> For 2-, a guard is added which is implemented as yet another predicate.
>> 
>> While this change is in principle simple, I ran into a few
>> implementation issues:
>> 
>> - while c2 has a way to compute the number of iterations of an int
>>   counted loop, it doesn't have that for long counted loop. The
>>   existing logic for int counted loops promotes values to long to
>>   avoid overflows. I reworked it so it now works for both long and int
>>   counted loops.
>> 
>> - I added a new deoptimization reason (Reason_short_running_loop) for
>>   the new predicate. Given the number of iterations is narrowed down
>>   by the predicate, the limit of the loop after transformation is a
>>   cast node that's control dependent on the short running loop
>>   predicate. Because once the counted loop is transformed, it is
>>   likely that range check predicates will be inserted and they will
>>   depend on the limit, the short running loop predicate has to be the
>>   one that's further away from the loop entry. Now it is also possible
>>   that the limit before transformation depends on a predicate
>>   (TestShortRunningLongCountedLoopPredicatesClone is an example), we
>>   can have: new predicates inserted after the transformation that
>>   depend on the casted limit that itself depend on old predicates
>>   added before the transformation. To solve this cicular dependency,
>>   parse and assert predicates are cloned between the old predicates
>>   and the loop head. The cloned short running loop parse predicate is
>>   the one that's used to insert the short running loop predicate.
>> 
>> - In the case of a long counted loop, the loop is transformed into a
>>   regular loop with a ...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits:
> 
>  - TestMemorySegment test fix
>  - test wip
>  - Merge branch 'master' into JDK-8342692
>  - refactor
>  - Merge branch 'master' into JDK-8342692
>  - Merge branch 'master' into JDK-8342692
>  - Merge branch 'master' into JDK-8342692
>  - Merge branch 'master' into JDK-8342692
>  - review
>  - reviews
>  - ... and 22 more: https://git.openjdk.org/jdk/compare/3f1d9b57...7dd6fde9

`compiler/escapeAnalysis/TestMissingAntiDependency.java` fails on Windows x64 and Linux AArch64 with `-XX:StressLongCountedLoop=200000000`:


# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (workspace\open\src\hotspot\share\opto\gcm.cpp:916), pid=35968, tid=34752
#  assert(use_mem_state != load->find_exact_control(load->in(0))) failed: dependence cycle found
#

Current CompileTask:
C2:710   98    b  4       TestMissingAntiDependency::test (89 bytes)

Stack: [0x0000007bdcb00000,0x0000007bdcc00000],  sp=0x0000007bdcbfbba0,  free space=1006k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x7d2910]  PhaseCFG::insert_anti_dependences+0xe30  (gcm.cpp:916)
V  [jvm.dll+0x7d591f]  PhaseCFG::schedule_late+0x47f  (gcm.cpp:1536)
V  [jvm.dll+0x7d083e]  PhaseCFG::global_code_motion+0x31e  (gcm.cpp:1650)
V  [jvm.dll+0x7cf2ad]  PhaseCFG::do_global_code_motion+0x6d  (gcm.cpp:1780)
V  [jvm.dll+0x55746d]  Compile::Code_Gen+0x19d  (compile.cpp:2953)
V  [jvm.dll+0x555ca0]  Compile::Compile+0x11d0  (compile.cpp:882)
V  [jvm.dll+0x45cfd9]  C2Compiler::compile_method+0x179  (c2compiler.cpp:144)
V  [jvm.dll+0x573a5a]  CompileBroker::invoke_compiler_on_method+0x7aa  (compileBroker.cpp:2317)
V  [jvm.dll+0x570fab]  CompileBroker::compiler_thread_loop+0x33b  (compileBroker.cpp:1976)
V  [jvm.dll+0x8ba602]  JavaThread::thread_main_inner+0x282  (javaThread.cpp:777)
V  [jvm.dll+0xfa95f4]  Thread::call_run+0x1b4  (thread.cpp:236)
V  [jvm.dll+0xd6ae91]  thread_native_entry+0xe1  (os_windows.cpp:566)
C  [ucrtbase.dll+0x2268a]  (no source info available)
C  [KERNEL32.DLL+0x17ac4]  (no source info available)
C  [ntdll.dll+0x5a8c1]  (no source info available)


Maybe it's (related to) [JDK-8341976](https://bugs.openjdk.org/browse/JDK-8341976)?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2635779606


More information about the hotspot-compiler-dev mailing list