RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2]
    Christian Hagedorn 
    chagedorn at openjdk.org
       
    Fri Oct 10 15:09:54 UTC 2025
    
    
  
On Fri, 10 Oct 2025 13:25:51 GMT, Benoît Maillard <bmaillard at openjdk.org> wrote:
>> This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations.
>> 
>> ### Analysis
>> 
>> This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced
>> and added to this PR as a regression test.
>> 
>> The test contains a switch inside a loop, and stressing the loop peeling results in
>> a fairly complex graph.  The split-if optimization is applied agressively, and we
>> run a verification pass at every progress made.
>> 
>> We end up with a relatively high number of verification passes, with each pass being
>> fairly expensive because of the size of the graph.
>> Each verification pass requires building a new `IdealLoopTree`. This is quite slow
>> (which is unfortunately hard to mitigate), and also causes inefficient memory usage
>> on the `ciEnv` arena.
>> 
>> The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method.
>> At every call, we have
>> - One allocation on the `ciEnv` arena to store the returned `ciField`
>> - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which:
>>   - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`)
>>   - Pushes the new symbol to the `_symbols` array
>> 
>> The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to
>> check if the `BasicType` of a static field is a reference type.
>> 
>> In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols
>> (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called
>> repeatedly as it is done here.
>> 
>> The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below:
>> 
>> 
>> ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412
>> TypeOopPtr::TypeOopPtr type.cpp:3484
>> TypeInstPtr::TypeInstPtr type.cpp:3953
>> TypeInstPtr::make type.cpp:3990
>> TypeInstPtr::add_offset type.cpp:4509
>> AddPNode::bottom_type addnode.cpp:696
>> MemNode::adr_type memnode.cpp:73
>> PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477
>> PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439
>> PhaseIdealLoop::build_lo...
>
> Benoît Maillard has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add -XX:+UnlockDiagnosticVMOptions
Nice summary and solution! I have a few comments but otherwise, the fix looks good to me. 
I guess it's a discussion for another time if we also want to improve the verification time somehow. But that should not block this PR.
src/hotspot/share/ci/ciInstanceKlass.cpp line 434:
> 432: //
> 433: // This is essentially a shortcut for:
> 434: //  get_field_type_by_offset(field_offset, is_static)->layout_type()
`get_field_by_offset()`?
Suggestion:
//  get_field_by_offset(field_offset, is_static)->layout_type()
src/hotspot/share/ci/ciInstanceKlass.cpp line 436:
> 434: //  get_field_type_by_offset(field_offset, is_static)->layout_type()
> 435: // except this does not require allocating memory for a new ciField
> 436: BasicType ciInstanceKlass::get_field_type_by_offset(int field_offset, bool is_static) {
Nit:
Suggestion:
BasicType ciInstanceKlass::get_field_type_by_offset(const int field_offset, const bool is_static) {
src/hotspot/share/ci/ciInstanceKlass.cpp line 443:
> 441:       if (field_off == field_offset)
> 442:         return field->layout_type();
> 443:     }
Could this code be shared with `get_field_by_offset()`? We could put it into a method and return the field.
Not sure if it's also worth for the field descriptor below when having a "get field descriptor" method to further share code. You would need to check. Anyway, I'm fine with both :-)
test/hotspot/jtreg/compiler/loopopts/TestVerifyLoopOptimizationsHitsMemLimit.java line 1:
> 1: package compiler.loopopts;
For consistency, I suggest to move it after the copyright
test/hotspot/jtreg/compiler/loopopts/TestVerifyLoopOptimizationsHitsMemLimit.java line 39:
> 37:  *      -XX:-TieredCompilation -Xcomp -XX:CompileCommand=dontinline,*::*
> 38:  *      -XX:+StressLoopPeeling -XX:PerMethodTrapLimit=0 -XX:+VerifyLoopOptimizations
> 39:  *      -XX:StressSeed=1870557292
I suggest to either remove the stress seed since it might not trigger anymore in later builds. Usually, we add a run with a fixed stress seed and one without but since this test requires to do just some work verification work, I would suggest to not add two runs but only one without fixed seed.
Another question: How close are we to hit the default the memory limit with this test? With your fix it probably consumes not much memory anymore. I therefore suggest to add  `MemLimit` as additional flag with a much smaller value to be sure that your fix works as expected (you might need to check how low we can choose the limit to not run into problems in higher tiers).
-------------
PR Review: https://git.openjdk.org/jdk/pull/27731#pullrequestreview-3324083968
PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2420663368
PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2420700310
PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2420698361
PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2420701817
PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2420721807
    
    
More information about the hotspot-compiler-dev
mailing list