Leverage profiled compiled size to avoid aggressive inlining and code growth

Fri Sep 26 15:15:15 UTC 2025

Post to hotspot-compiler-dev at openjdk.org<mailto:hotspot-compiler-dev at openjdk.org> instead of hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>.

Sorry for the repetition.

From: Fei Gao <Fei.Gao2 at arm.com>
Date: Friday, 26 September 2025 at 14:52
To: leyden-dev <leyden-dev at openjdk.org>, hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Leverage profiled compiled size to avoid aggressive inlining and code growth
Hi @leyden-dev<mailto:leyden-dev at openjdk.org> and @hotspot compiler<mailto:hotspot-compiler-dev at openjdk.java.net>,

TL;DR

https://github.com/openjdk/jdk/pull/27527

I proposed a PoC that explores leveraging profiled compiled sizes to improve C2 inlining decisions and mitigate code bloat. The approach records method sizes during a pre-run and feeds them back via compiler directives, helping to reduce aggressive inlining of large methods.

Testing on Renaissance and SPECjbb2015 showed clear code size differences but no significant performance impact on either AArch64 or x86. An alternative AOT-cache-based approach was also evaluated but did not produce meaningful code size changes.

Open questions remain about the long-term value of profiling given Project Leyden's direction of caching compiled code in AOT, and whether global profiling information could help C2 make better inlining decisions.

1. Motivation

In the current C2 behavior, the inliner only considers the estimated inlined size [1] [2] of a callee if the method has already been compiled by C2. In particular, C2 will explicitly reject inlining in the following cases:
    Hot methods with bytecode size > FreqInlineSize (325) [3]
    Cold methods with bytecode size > MaxInlineSize (35)

However, a common situation arises where a method's bytecode size is below 325, yet once compiled by C2 it produces a very large machine code body. If this method has not been compiled at the time its caller is being compiled, the inliner may aggressively inline it, potentially bloating the caller, even though an independent compiled copy might eventually exist.

To mitigate such cases, we can make previously profiled compiled sizes available early, allowing the inliner to make more informed decisions and reduce excessive code growth.

[1] https://github.com/openjdk/jdk/blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/bytecodeInfo.cpp#L180
[2] https://github.com/openjdk/jdk/blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/bytecodeInfo.cpp#L274
[3] https://github.com/openjdk/jdk/blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/bytecodeInfo.cpp#L184

2. Proof of Concept

To validate this idea, I created a proof-of-concept: https://github.com/openjdk/jdk/pull/27527

In this PoC:

1) A dumping interface was added to record C2-compiled method sizes, enabled via the `-XX:+PrintOptoMethodSize` flag.

2) A new attribute was introduced in InlineMatcher: _inline_instructions_size. This attribute stores the estimated inlined size of a method, sourced from a compiler directive JSON file generated during a prior profiling run.

3) The inliner was updated to use these previously profiled method sizes to prevent aggressive inlining of large methods.

3. How to Use

To apply this approach to any workload, the workload must be run twice:
1) Pre-run: collect inlined sizes for all C2-compiled methods.
2) Product run: use the profiled method sizes to improve C2 inlining.

Step 1 Profile method size (pre-run)

Log the compiled method size:
`-XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:+PrintOptoMethodSize -XX:LogFile=logmethodsize.out` This will generate a log containing method size information from C2.

Step 2 Generate the compiler directive file

Use the provided Python script to extract method size info and generate a JSON file:
`python3 extract_size_to_directives.py logmethodsize.out output_directives.json`

This file contains estimated inlined sizes to guide inlining decisions in product run. If the same method is compiled multiple times, the script conservatively retains the smallest observed size.
Note: Methods that are not accepted by the CompilerDirective format need to be excluded.

Step 3 Use the compiler directive in a product run

Pass the generated JSON to the JVM as a directive:
`-XX:+UnlockDiagnosticVMOptions -XX:CompilerDirectivesFile=output_directives.json`
This enables the inliner to make decisions using previously profiled method sizes, potentially avoiding aggressive inlining of large methods.
Note: The patch reuses the existing `inline` directive attribute for inlining control. If multiple inline rules match the same method, only the first match is effective.

4. Testing

I tested the following workloads using the method above and measured the code cache size with `-XX:+PrintCodeCache`. The results are shown below, compared against the mainline code. All statistics (min, max, median, mean) are based on three runs.

(patch - mainline) / mainline

1) Renaissance.dotty

Code size change:

AArch64:

```
used           min      max      median   mean
non-profiled   -9.88%   -8.13%   -8.92%   -8.98%
profiled       -0.73%   -0.21%   -0.40%   -0.45%
non-nmethods   -15.20%  -0.02%   -14.92%   -10.32%
codecache      -2.82%   -2.88%   -2.97%   -2.89%
max_used       min      max      median   mean
non-profiled   -9.88%   -8.13%   -8.92%   -8.98%
profiled       2.37%    1.41%    1.50%    1.76%
non-nmethods   -0.95%   -1.73%   -0.93%   -1.21%
codecache      -0.35%   -1.00%   -0.95%   -0.77%
```

X86:

```
used            min      max      median   mean
non-profiled    -9.72%   -9.61%   -9.36%   -9.56%
profiled        -0.81%   -0.90%   -1.15%   -0.95%
non-nmethods    -0.04%   0.04%    -0.02%   -0.01%
codecache       -2.94%   -2.96%   -3.11%   -3.00%
max_used        min      max      median   mean
non-profiled    -9.72%   -9.61%   -9.36%   -9.56%
profiled        2.32%    2.60%    2.51%    2.48%
non-nmethods    -0.63%   -2.25%   -1.28%   -1.39%
codecache       -0.68%   -0.59%   -0.70%   -0.66%
```

No significant performance changes were observed on either platform.

2) SPECjbb 2015

Code size change:

AArch64:

```
used           min      max       median    mean
non-profiled   -1.00%   -11.68%   -12.73%   -8.62%
profiled       9.07%    -6.93%    -2.34%    -0.29%
non-nmethods   0.02%    -0.02%    0.00%     0.00%
codecache      2.98%    -7.18%    -5.35%    -3.28%
max_used       min      max       median    mean
non-profiled   -10.85%  -11.68%   -12.73%   -11.76%
profiled       -2.09%   -11.65%   -1.26%   -5.62%
non-nmethods   0.13%    -1.21%    -0.16%   -0.41%
codecache      -6.42%   -6.33%    -6.10%   -6.29%
```

On the AArch64 platform, no significant performance changes were observed for either high-bound IR or max jOPS.

For critical jOPS:
```
Min      Median   Mean     Max     Var%
-2.45%   -1.87%   -2.45%   -3.00%  1.9%
```

X86:

```
used           min      max      median   mean
non-profiled   -9.02%   -9.65%   -7.93%   -8.87%
profiled       -6.09%   -3.18%   -4.52%   -4.61%
non-nmethods   -0.02%   0.25%    0.04%    0.09%
codecache      -5.36%   -4.75%   -4.58%   -4.90%
max_used       min      max      median   mean
non-profiled   -4.03%   -9.65%   -7.93%   -7.23%
profiled       -2.86%   1.16%    -1.03%   -0.93%
non-nmethods   0.02%    -0.08%   0.08%    0.01%
codecache      -0.23%   -4.20%   -3.70%   -2.73%
```

No significant performance change was observed on x86 platform.

5. AOT cache

The current procedure above requires three steps:
a pre-run to record method sizes,
a separate step to process the JSON file,
and finally a product run using the profiled method sizes.

This workflow may add extra burden to workload deployment.

With JEP 515 [4], we can instead store the estimated inlined size in the AOT cache when ciMethod::inline_instructions_size() is called during the premain run, and later load this size from the AOT cache during the product run [5].

The store-load mechanism for inlined size can help reduce the overhead of recomputing actual sizes, but it does not provide the inliner with much additional information about the callee, since the compilation order in the product run generally follows that of the premain run, even if not exactly.

To give the inliner more profiled information about callees, I tried another simple draft that records inlined sizes for more C2-compiled methods:
https://github.com/openjdk/jdk/pull/27519/commits/ef5e61f3d68ad565ee11e2cc6aa57b6e2697ae6d

However, with this draft using the AOT cache, I did not observe any significant code size changes for any workloads. This may require further investigation.

[4] https://openjdk.org/jeps/515
[5] https://github.com/openjdk/jdk/blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/ci/ciMethod.cpp#L1152

6 Questions

1) Relation to Project Leyden

Project Leyden aims to enhance the AOT cache to store compiled code from training runs [6]. This suggests that we may eventually prefer to cache compiled code directly from the AOT cache rather than rely solely on JIT compilation.

Given this direction, is it still worthwhile to invest further in using profiled method sizes as a means to improve inlining heuristics?

Could such profiling provide complementary benefits even if compiled code is cached?

2) Global profiling information for C2

Should we consider leveraging profiled information stored in the AOT cache to give the C2 inliner a broader, more global view of methods, enabling better inlining decisions?

For example, could global visibility into method sizes and call sites help address pathological cases of code bloat or missed optimization opportunities? [7]

[6] https://openjdk.org/jeps/8335368
[7] https://wiki.openjdk.org/display/hotspot/inlining

I'd greatly appreciate any feedback. Thank you for your time and consideration.

Thanks,
Fei
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-compiler-dev/attachments/20250926/9df54471/attachment-0001.htm>