RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v6]

Fri Dec 12 22:21:53 UTC 2025

On Fri, 12 Dec 2025 05:13:11 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Hi,
>> 
>> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return.
>> 
>> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes.
>> 
>> For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2.
>> 
>> Please take a look and leave your thoughts, thanks a lot.
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
> 
>  - Merge branch 'master' into foldmem
>  - grammar, safe change
>  - more detailed explanations
>  - store values need normalizing
>  - Just use candidate_set directly
>  - Some runtime calls may receive a derived pointer but not the base
>  - Aggressively fold loads from objects that have not escaped

Some more thoughts/ideas:

So, an object can escape either through a store to memory or as an argument to a call. (Any other scenarios?) 

If we leave memory graph considerations aside, then traversing control graph from a barrier (call/membar) up to allocation should enumerate all calls and stores in that range. (All stores have control.)

(Theoretically, a store control can end up higher in the control graph, but I don't think it happens in practice.)

If a call/store has a data dependency on the allocation, then it's an escaping point. 

One case left is the following: if a store has a control in the region, it can be scheduled after the region unless the store dominates the barrier in the memory graph. But, conservatively, it can also be treated as an escape point interfering with the access being optimized.     

So, either doing CFG-only or CFG+memory traversal (plus, data inputs traversal on arguments) should detect whether there's an interfering escape point present or not. 

Do you see any flaws in my reasoning?

Speaking of the associated costs, it doesn't look prohibitively expensive. The search is localized and doesn't involve traversal of the whole graph.

Alternatively, results of previous analysis requests can be cached. The property changes monotonically: a previously non-escaping case can't turn into escaping one later. If a cache is not invalidated, than the worst case is an optimization opportunity is missed.

Speaking of the general approach, if analysis part turns out to be way too
expensive for IGVN, I'd still prefer to have the analysis and transformation to be
separated and IGVN used to conduct the actual IR changes.

There's already some duplication and divergence between IGVN & `PhaseLoadFolding`
implementation. Without proper care, it can easily get worse in the future.

Another thing to consider: it's beneficial to perform such transformation
earlier, as IGVN case illustrates. (For example, by the time EA kicks in,
inlining is over.) Shared implementation is easier to maintain and reuse.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3648339738
PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3648340281