RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times

Xiaohong Gong xgong at openjdk.org
Fri May 30 07:48:31 UTC 2025


C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes.
 This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer
 to the detailed discussion for a related performance issue from [1].

The ideal graph of such a loop typically looks like:


                          /-----------|
                         |            |
                         |   ConI     |
               loop      |  /        /
                 |       | /        /
                  \     AddI       /
      RangeCheck   \    /         |
              |     \  /          |
             IfTrue  Phi          |
                 \    |           |
    RangeCheck    \   |           |
             \    CastII          /     <- Range check #1
              |        |         /
             IfTrue    |        |
                  \    |        |
                  CastII        |       <- Range check #2
                      |        /
                      |-------/



For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used
 by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop.

This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations.

Test:
 - Tested tier1, tier2, tier3, and no regressions are found. 
 - An additional test case is added to verify the fix.

Performance:
Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture:


Benchmark                      Mode   Cnt Unit   Before      After        Gain
CountedLoopCastIV.loop_iv_int  thrpt  30  ops/s  941482.597  4389292.439  4.66
CountedLoopCastIV.loop_iv_long thrpt  30  ops/s  884563.232  1441485.455  1.62


We can also observe the similar uplift on a x86_64 machine.

[1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654

-------------

Commit messages:
 - 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times

Changes: https://git.openjdk.org/jdk/pull/25539/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25539&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8357726
  Stats: 257 lines in 3 files changed: 256 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/25539.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25539/head:pull/25539

PR: https://git.openjdk.org/jdk/pull/25539


More information about the hotspot-compiler-dev mailing list