RFR: Experiment with storing target method for static and opt-virtual callsites in reloc info

Vladimir Ivanov vlivanov at openjdk.org
Wed Dec 10 21:16:38 UTC 2025


On Tue, 9 Dec 2025 21:15:23 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:

> This work aims to reduce the time taken to perform call resolution by caching the result of direct calls (static and opt-virtual) in the reloc info during compilation of a method. 
> Relocations for static and opt-virtual calls already have a field `method_index` which is used to store the "real" method to be invoked by the method handle. It is currently only used during c2 compilations.
> This patch re-uses the `method_index` field for static and opt-virtual calls to store the target method. The runtime call (`SharedRuntime::resolve_helper`) used by the compiled code to perform the call site resolution can then optimize the resolution process by getting the target method from the reloc info and patches the callsite through CompiledDirectCall.
> No special handling is needed for AOT code.
> 
> On a 4-cpu system there is around 3% improvement in `spring-boot-getting-started`. Numbers for JavacBench range between 0-3% improvement.
> 
> `spring-boot-getting-started`:
> 
> Run,Old CDS + AOT,New CDS + AOT
> 1,199,192
> 2,199,196
> 3,202,197
> 4,203,198
> 5,198,196
> 6,201,194
> 7,203,197
> 8,200,193
> 9,204,193
> 10,199,201
> Geomean,200.79,195.68 (1.03x improvement)
> Stdev,1.99,2.61
> 
> 
> `-Xlog:init` shows the numbers for time spent in call resolution from the compiled code.
> For `spring-boot-getting-started` before this patch:
> 
> [0.357s][info][init] SharedRuntime:
> [0.357s][info][init]   resolve_opt_virtual_call:      8260us /  2249 events
> [0.357s][info][init]   resolve_virtual_call:          6899us /  1297 events
> [0.357s][info][init]   resolve_static_call:           4646us /  1723 events
> [0.357s][info][init]   handle_wrong_method:            680us /   145 events
> [0.357s][info][init]   ic_miss:                       2109us /   488 events
> [0.357s][info][init] Total:                      22596us
> [0.357s][info][init]   perf_resolve_static_cache_hit_ctr:     0
> [0.357s][info][init]   perf_resolve_opt_virtual_cache_hit_ctr:     0
> 
> 
> For `spring-boot-getting-started` after this patch:
> 
> [0.348s][info][init] SharedRuntime:
> [0.348s][info][init]   resolve_opt_virtual_call:      2774us /  2251 events
> [0.348s][info][init]   resolve_virtual_call:          5577us /  1294 events
> [0.348s][info][init]   resolve_static_call:           1901us /  1728 events
> [0.348s][info][init]   handle_wrong_method:            719us /   146 events
> [0.348s][info][init]   ic_miss:                       2109us /   474 events
> [0.348s][info][init] Total:                      13082us
> [0.348s][info][init]   perf_resolve_static_cache_hit_ctr:  1704
> ...

Nice work!

src/hotspot/share/runtime/sharedRuntime.cpp line 114:

> 112: PerfTickCounters* SharedRuntime::_perf_ic_miss_total_time             = nullptr;
> 113: 
> 114: uint SharedRuntime::_perf_resolve_static_cache_hit_ctr = 0;

PerfCounters are usually more convenient to use than raw counters. For example, they can be sampled on-the-fly from a live process.

src/hotspot/share/runtime/sharedRuntime.cpp line 1528:

> 1526: 
> 1527:   if (UseNewCode2) {
> 1528:     bool is_mhi;

I believe disabling inlining through MH linkers when generating archived code should simplify things. Then, there should be no attached methods for MH linkers in archived code and vise-versa.

-------------

PR Review: https://git.openjdk.org/leyden/pull/106#pullrequestreview-3564491705
PR Review Comment: https://git.openjdk.org/leyden/pull/106#discussion_r2608217693
PR Review Comment: https://git.openjdk.org/leyden/pull/106#discussion_r2608224195


More information about the leyden-dev mailing list