RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code

Benoît Maillard bmaillard at openjdk.org
Fri Oct 10 12:44:24 UTC 2025


This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations.

### Analysis

This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced
and added to this PR as a regression test.

The test contains a switch inside a loop, and stressing the loop peeling results in
a fairly complex graph.  The split-if optimization is applied agressively, and we
run a verification pass at every progress made.

We end up with a relatively high number of verification passes, with each pass being
fairly expensive because of the size of the graph.
Each verification pass requires building a new `IdealLoopTree`. This is quite slow
(which is unfortunately hard to mitigate), and also causes inefficient memory usage
on the `ciEnv` arena.

The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method.
At every call, we have
- One allocation on the `ciEnv` arena to store the returned `ciField`
- The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which:
  - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`)
  - Pushes the new symbol to the `_symbols` array

The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to
check if the `BasicType` of a static field is a reference type.

In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols
(up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called
repeatedly as it is done here.

The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below:


ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412
TypeOopPtr::TypeOopPtr type.cpp:3484
TypeInstPtr::TypeInstPtr type.cpp:3953
TypeInstPtr::make type.cpp:3990
TypeInstPtr::add_offset type.cpp:4509
AddPNode::bottom_type addnode.cpp:696
MemNode::adr_type memnode.cpp:73
PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477
PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439
PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827
PhaseIdealLoop::build_loop_late_post loopnode.cpp:6715
PhaseIdealLoop::build_loop_late loopnode.cpp:6660
PhaseIdealLoop::build_and_optimize loopnode.cpp:5093
PhaseIdealLoop::PhaseIdealLoop loopnode.hpp:1209
PhaseIdealLoop::verify loopnode.cpp:5336
...


Because the `ciEnv` arena is not fred up between verification passes, it quickly fills up and hits
the memory limit after about 30s of execution in this case.

### Proposed fix

As explained in the previous section, the only point of the `ciInstanceKlass::get_field_by_offset`
call is to obtain the `BasicType` of the field. By inspecting carefully what this method does,
we notice that the field descriptor `fd` already contains the type information we need.
We do not actually need all the information embedded in the `ciField` object.

```c++
ciField* ciInstanceKlass::get_field_by_offset(int field_offset, bool is_static) {
  if (!is_static) {
    for (int i = 0, len = nof_nonstatic_fields(); i < len; i++) {
      ciField* field = _nonstatic_fields->at(i);
      int  field_off = field->offset_in_bytes();
      if (field_off == field_offset)
        return field;
    }
    return nullptr;
  }
  VM_ENTRY_MARK;
  InstanceKlass* k = get_instanceKlass();
  fieldDescriptor fd;
  if (!k->find_field_from_offset(field_offset, is_static, &fd)) {
    return nullptr;
  }
  ciField* field = new (CURRENT_THREAD_ENV->arena()) ciField(&fd);
  return field;
}


Hence we can simply create a more specialized version of `ciInstanceKlass::get_field_type_by_offset`
that directly returns the `BasicType` without creating the `ciField`. This happens to
avoid the three memory allocations mentioned before.

After this change, the memory usage of the `ciEnv` arena stays constant across verification
passes.

### Testing
- [x] Added test obtained from the fuzzer (and reduced with c-reduce)
- [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8366990)
- [x] tier1-3, plus some internal testing

Thank you for reviewing!

-------------

Commit messages:
 - Minor comments and style changes
 - 8366990: Add reduced test from the fuzzer
 - 8366990: Avoid growing ciEnv arena in TypeOopPtr::TypeOopPtr

Changes: https://git.openjdk.org/jdk/pull/27731/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366990
  Stats: 168 lines in 4 files changed: 161 ins; 2 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/27731.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27731/head:pull/27731

PR: https://git.openjdk.org/jdk/pull/27731


More information about the hotspot-compiler-dev mailing list