RFR: 8325681: C2 inliner rejects to inline a deeper callee because the methoddata of caller is immature. [v2]

Tue Feb 27 19:44:42 UTC 2024

On Tue, 27 Feb 2024 19:37:00 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>>> You mean method_data()->is_mature() was introduced 20 years ago? I didn't capture this. git blame can only trace back to '8153779ad32d1e8ddd37ced826c76c7aafc61894', or "Initial load" in 2007.
>> 
>> It predates OpenJDK and unfortunately the repository doesn't have any history left from that period. 
>> 
>>> May I know more about 'excessive recompilation'? what's your definition of it?
>> why does using immature methoddata lead to excessive recompilation?
>> 
>> C2 heavily relies on profiling when performing speculative optimizations. The assumption is the profile is accurate enough to make all the speculations a net win. Optimizing a large enough application consists of numerous individual guesses and some of the predictions routinely fail at runtime causing expensive to recover situations (e.g., "just" an uncommon trap or a signal caused by an implicit exception followed by full reprofile-recompile cycle). If profile data is of a poor quality, it can mislead the compiler to make suboptimal choices and worsen the situation.  
>> 
>> So, C2 heuristics are tuned to let the profile mature assuming the data becomes more representative over time. (Relevant comment in [methodData.hpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/oops/methodData.hpp#L41).)
>> 
>> Based on the synopsis, the motivation for the fix is to help C2 EA pass. Indeed, there are many situations when a full-blown call in generated code severely affects the quality of optimizations. On the other hand, it a method is inlined prematurely, it may cause a net performance loss as well. For example, when compiled code is invalidated shortly after the compilation, the resources spent on the compilation are wasted  and it slows down application warmup.
>> 
>> There were some discussions in the past about EA-aware inlining heuristics. Unfortunately, not much progress has been done on that front yet. Partly, because it's notoriously hard and risky to incrementally improve inlining heuristics: they have pervasive effects, so even trivial changes easily cause numerous improvements and regressions across a wide set of applications.
>> 
>> Speaking of your particular case, as an idea: maybe a static analysis of the callee method may give enough confidence that profile quality doesn't matter there, so it's safe to inline when there are clear benefits for EA? C2 EA already perform some amount of analysis to detect when objects arg-escape (see  [BCEscapeAnalyzer](https://github.com/openjdk/jdk/blob/maste...
>
>> I don't think '_count field of ciCallProfile = -1' is correct for an immature method. it forces c2 to outline a call. C2 should make judgement based on the information HotSpot has collected. The real frequency is > than MinInlineFrequencyRatio.
> 
> Indeed, that does look excessive. As another idea: utilize frequencies, but not type profiles at call sites with immature profiles. But then the next question is how representative the profile data at callee side then...

Actually, there's a similar problematic scenario, now at allocation site: a local allocation followed by a long running loop.

A a = factoryA::make(...); // new A(...)
for (int i = 0; i < large_count; i++) {
   // ... a is eligible for scalarization ...
}

Inlining `make` method is a prerequisite to scalarize `a`, but profiling data is so scarce and hard-to-gather (a sample per long-running loop), so it's impractical to wait until profiling is over. It's straightforward to prove that `make` frequency is 100% of total executions (since it dominates the loop), but absolute counts don't make it evident.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17957#discussion_r1504865993