Leverage profiled compiled size to avoid aggressive inlining and code growth
Vladimir Kozlov
vladimir.kozlov at oracle.com
Fri Sep 26 15:40:16 UTC 2025
Hi Fei,
I think you stumble on `InlineSmallCode` (1000 or 1500) issues. The flag
is used exactly for filter out inlining of previously big compiled code.
But it not always helps - for example, if the method is inlined some
paths could be removed due to constants (exact klass) propagation or EA
can eliminate some allocations. We know about such limitation and have
numerous RFEs to improve it.
On other hand, FreqInlineSize and MaxInlineSize flags are based on
bytecode size of method. This are more stable.
Note, AOT profiling caching also preserves inlining decisions for C2
which is used during JIT compilation in production run to reproduce
compilation decisions in training run.
We don't advise to use JSON. Please, store information in AOT cache instead.
Regards,
Vladimir K
On 9/26/25 8:15 AM, Fei Gao wrote:
> Post to hotspot-compiler-dev at openjdk.org <mailto:hotspot-compiler-
> dev at openjdk.org> instead of hotspot-compiler-dev at openjdk.java.net
> <mailto:hotspot-compiler-dev at openjdk.java.net>.
>
> Sorry for the repetition.
>
> *From: *Fei Gao <Fei.Gao2 at arm.com>
> *Date: *Friday, 26 September 2025 at 14:52
> *To: *leyden-dev <leyden-dev at openjdk.org>, hotspot compiler <hotspot-
> compiler-dev at openjdk.java.net>
> *Subject: *Leverage profiled compiled size to avoid aggressive inlining
> and code growth
>
> Hi @leyden-dev <mailto:leyden-dev at openjdk.org>and @hotspot compiler
> <mailto:hotspot-compiler-dev at openjdk.java.net>,
>
> *TL;DR*
>
> **
>
> *https://github.com/openjdk/jdk/pull/27527* <https://github.com/openjdk/
> jdk/pull/27527>
>
> I proposed a PoC that explores leveraging profiled compiled sizes to
> improve C2 inlining decisions and mitigate code bloat. The approach
> records method sizes during a pre-run and feeds them back via compiler
> directives, helping to reduce aggressive inlining of large methods.
>
> Testing on Renaissance and SPECjbb2015 showed clear code size
> differences but no significant performance impact on either AArch64 or
> x86. An alternative AOT-cache-based approach was also evaluated but did
> not produce meaningful code size changes.
>
> Open questions remain about the long-term value of profiling given
> Project Leyden's direction of caching compiled code in AOT, and whether
> global profiling information could help C2 make better inlining decisions.
>
> *1. Motivation*
>
> In the current C2 behavior, the inliner only considers the estimated
> inlined size [1] [2] of a callee if the method has already been compiled
> by C2. In particular, C2 will explicitly reject inlining in the
> following cases:
>
> Hot methods with bytecode size > FreqInlineSize (325) [3]
>
> Cold methods with bytecode size > MaxInlineSize (35)
>
> However, a common situation arises where a method's bytecode size is
> below 325, yet once compiled by C2 it produces a very large machine code
> body. If this method has not been compiled at the time its caller is
> being compiled, the inliner may aggressively inline it, potentially
> bloating the caller, even though an independent compiled copy might
> eventually exist.
>
> To mitigate such cases, we can make previously profiled compiled sizes
> available early, allowing the inliner to make more informed decisions
> and reduce excessive code growth.
>
> [1] https://github.com/openjdk/jdk/
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/
> bytecodeInfo.cpp#L180 <https://github.com/openjdk/jdk/
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/
> bytecodeInfo.cpp#L180>
>
> [2] https://github.com/openjdk/jdk/
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/
> bytecodeInfo.cpp#L274 <https://github.com/openjdk/jdk/
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/
> bytecodeInfo.cpp#L274>
>
> [3] https://github.com/openjdk/jdk/
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/
> bytecodeInfo.cpp#L184 <https://github.com/openjdk/jdk/
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/
> bytecodeInfo.cpp#L184>
>
> *2. Proof of Concept*
>
> To validate this idea, I created a proof-of-concept: *https://
> github.com/openjdk/jdk/pull/27527* <https://github.com/openjdk/jdk/
> pull/27527>
>
> In this PoC:
>
> 1) A dumping interface was added to record C2-compiled method sizes,
> enabled via the `-XX:+PrintOptoMethodSize` flag.
>
> 2) A new attribute was introduced in InlineMatcher:
> _inline_instructions_size. This attribute stores the estimated inlined
> size of a method, sourced from a compiler directive JSON file generated
> during a prior profiling run.
>
> 3) The inliner was updated to use these previously profiled method sizes
> to prevent aggressive inlining of large methods.
>
> *3. How to Use*
>
> To apply this approach to any workload, the workload must be run twice:
>
> 1) Pre-run: collect inlined sizes for all C2-compiled methods.
>
> 2) Product run: use the profiled method sizes to improve C2 inlining.
>
> Step 1 Profile method size (pre-run)
>
> Log the compiled method size:
>
> `-XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:
> +PrintOptoMethodSize -XX:LogFile=logmethodsize.out` This will generate a
> log containing method size information from C2.
>
> Step 2 Generate the compiler directive file
>
> Use the provided Python script to extract method size info and generate
> a JSON file:
>
> `python3 extract_size_to_directives.py logmethodsize.out
> output_directives.json`
>
> This file contains estimated inlined sizes to guide inlining decisions
> in product run. If the same method is compiled multiple times, the
> script conservatively retains the smallest observed size.
>
> Note: Methods that are not accepted by the CompilerDirective format need
> to be excluded.
>
> Step 3 Use the compiler directive in a product run
>
> Pass the generated JSON to the JVM as a directive:
>
> `-XX:+UnlockDiagnosticVMOptions -
> XX:CompilerDirectivesFile=output_directives.json`
>
> This enables the inliner to make decisions using previously profiled
> method sizes, potentially avoiding aggressive inlining of large methods.
>
> Note: The patch reuses the existing `inline` directive attribute for
> inlining control. If multiple inline rules match the same method, only
> the first match is effective.
>
> *4. Testing*
>
> I tested the following workloads using the method above and measured the
> code cache size with `-XX:+PrintCodeCache`. The results are shown below,
> compared against the mainline code. All statistics (min, max, median,
> mean) are based on three runs.
>
> (patch - mainline) / mainline
>
> 1) Renaissance.dotty
>
> Code size change:
>
> AArch64:
>
> ```
>
> used min max median mean
>
> non-profiled -9.88% -8.13% -8.92% -8.98%
>
> profiled -0.73% -0.21% -0.40% -0.45%
>
> non-nmethods -15.20% -0.02% -14.92% -10.32%
>
> codecache -2.82% -2.88% -2.97% -2.89%
>
> max_used min max median mean
>
> non-profiled -9.88% -8.13% -8.92% -8.98%
>
> profiled 2.37% 1.41% 1.50% 1.76%
>
> non-nmethods -0.95% -1.73% -0.93% -1.21%
>
> codecache -0.35% -1.00% -0.95% -0.77%
>
> ```
>
> X86:
>
> ```
>
> used min max median mean
>
> non-profiled -9.72% -9.61% -9.36% -9.56%
>
> profiled -0.81% -0.90% -1.15% -0.95%
>
> non-nmethods -0.04% 0.04% -0.02% -0.01%
>
> codecache -2.94% -2.96% -3.11% -3.00%
>
> max_used min max median mean
>
> non-profiled -9.72% -9.61% -9.36% -9.56%
>
> profiled 2.32% 2.60% 2.51% 2.48%
>
> non-nmethods -0.63% -2.25% -1.28% -1.39%
>
> codecache -0.68% -0.59% -0.70% -0.66%
>
> ```
>
> No significant performance changes were observed on either platform.
>
> 2) SPECjbb 2015
>
> Code size change:
>
> AArch64:
>
> ```
>
> used min max median mean
>
> non-profiled -1.00% -11.68% -12.73% -8.62%
>
> profiled 9.07% -6.93% -2.34% -0.29%
>
> non-nmethods 0.02% -0.02% 0.00% 0.00%
>
> codecache 2.98% -7.18% -5.35% -3.28%
>
> max_used min max median mean
>
> non-profiled -10.85% -11.68% -12.73% -11.76%
>
> profiled -2.09% -11.65% -1.26% -5.62%
>
> non-nmethods 0.13% -1.21% -0.16% -0.41%
>
> codecache -6.42% -6.33% -6.10% -6.29%
>
> ```
>
> On the AArch64 platform, no significant performance changes were
> observed for either high-bound IR or max jOPS.
>
> For critical jOPS:
>
> ```
>
> Min Median Mean Max Var%
>
> -2.45% -1.87% -2.45% -3.00% 1.9%
>
> ```
>
> X86:
>
> ```
>
> used min max median mean
>
> non-profiled -9.02% -9.65% -7.93% -8.87%
>
> profiled -6.09% -3.18% -4.52% -4.61%
>
> non-nmethods -0.02% 0.25% 0.04% 0.09%
>
> codecache -5.36% -4.75% -4.58% -4.90%
>
> max_used min max median mean
>
> non-profiled -4.03% -9.65% -7.93% -7.23%
>
> profiled -2.86% 1.16% -1.03% -0.93%
>
> non-nmethods 0.02% -0.08% 0.08% 0.01%
>
> codecache -0.23% -4.20% -3.70% -2.73%
>
> ```
>
> No significant performance change was observed on x86 platform.
>
> *5. AOT cache*
>
> The current procedure above requires three steps:
>
> a pre-run to record method sizes,
>
> a separate step to process the JSON file,
>
> and finally a product run using the profiled method sizes.
>
> This workflow may add extra burden to workload deployment.
>
> With JEP 515 [4], we can instead store the estimated inlined size in the
> AOT cache when ciMethod::inline_instructions_size() is called during the
> premain run, and later load this size from the AOT cache during the
> product run [5].
>
> The store-load mechanism for inlined size can help reduce the overhead
> of recomputing actual sizes, but it does not provide the inliner with
> much additional information about the callee, since the compilation
> order in the product run generally follows that of the premain run, even
> if not exactly.
>
> To give the inliner more profiled information about callees, I tried
> another simple draft that records inlined sizes for more C2-compiled
> methods:
>
> https://github.com/openjdk/jdk/pull/27519/commits/
> ef5e61f3d68ad565ee11e2cc6aa57b6e2697ae6d <https://github.com/openjdk/
> jdk/pull/27519/commits/ef5e61f3d68ad565ee11e2cc6aa57b6e2697ae6d>
>
> However, with this draft using the AOT cache, I did not observe any
> significant code size changes for any workloads. This may require
> further investigation.
>
> [4] https://openjdk.org/jeps/515 <https://openjdk.org/jeps/515>
>
> [5] https://github.com/openjdk/jdk/
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/ci/
> ciMethod.cpp#L1152 <https://github.com/openjdk/jdk/
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/ci/
> ciMethod.cpp#L1152>
>
> *6 Questions*
>
> 1) Relation to Project Leyden
>
> Project Leyden aims to enhance the AOT cache to store compiled code from
> training runs [6]. This suggests that we may eventually prefer to cache
> compiled code directly from the AOT cache rather than rely solely on JIT
> compilation.
>
> Given this direction, is it still worthwhile to invest further in using
> profiled method sizes as a means to improve inlining heuristics?
>
> Could such profiling provide complementary benefits even if compiled
> code is cached?
>
> 2) Global profiling information for C2
>
> Should we consider leveraging profiled information stored in the AOT
> cache to give the C2 inliner a broader, more global view of methods,
> enabling better inlining decisions?
>
> For example, could global visibility into method sizes and call sites
> help address pathological cases of code bloat or missed optimization
> opportunities? [7]
>
> [6] https://openjdk.org/jeps/8335368 <https://openjdk.org/jeps/8335368>
>
> [7] https://wiki.openjdk.org/display/hotspot/inlining <https://
> wiki.openjdk.org/display/hotspot/inlining>
>
> I'd greatly appreciate any feedback. Thank you for your time and
> consideration.
>
> Thanks,
>
> Fei
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy
> the information in any medium. Thank you.
More information about the hotspot-compiler-dev
mailing list