RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v4]
Emanuel Peter
epeter at openjdk.org
Fri Jun 20 08:46:34 UTC 2025
On Wed, 18 Jun 2025 07:46:21 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes.
>> This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer
>> to the detailed discussion for a related performance issue from [1].
>>
>> The ideal graph of such a loop typically looks like:
>>
>>
>> /-----------|
>> | |
>> | ConI |
>> loop | / /
>> | | / /
>> \ AddI /
>> RangeCheck \ / |
>> | \ / |
>> IfTrue Phi |
>> \ | |
>> RangeCheck \ | |
>> \ CastII / <- Range check #1
>> | | /
>> IfTrue | |
>> \ | |
>> CastII | <- Range check #2
>> | /
>> |-------/
>>
>>
>>
>> For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used
>> by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop.
>>
>> This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations.
>>
>> Test:
>> - Tested tier1, tier2, tier3, and no regressions are found.
>> - An additional test case is added to verify the fix.
>>
>> Performance:
>> Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture:
>>
>>
>> Benchmark Mode Cnt Unit Before After Gain
>> CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66
>> CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62
>>
>>
>> We can also observe the similar uplift on a x86_64 machine.
>>
>> [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>
> - Merge branch 'jdk:master' into JDK-8357726
>
> Change-Id: I0c10a563a3873b2220ce4d4c9b999c52159f578f
> - Address reivew comments on IR test
> - Address review comments on jtreg and jmh tests
> - 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times
LGTM, thanks for the work you put in :)
-------------
Marked as reviewed by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/25539#pullrequestreview-2945061713
More information about the hotspot-compiler-dev
mailing list