RFR: 8355769: Optimize nmethod dependency recording

Aleksey Shipilev shade at openjdk.org
Tue Apr 29 08:48:20 UTC 2025


On Mon, 28 Apr 2025 18:01:42 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK.
> 
> This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost.
> 
> Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!). 
> 
> Additional testing:
>  - [x] Ad-hoc benchmarks
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`

A crude way to show impact on this in mainline: compile lots of simple methods.


$ taskset -c 0-7 hyperfine -w 3 -r 10 \
  "build/linux-x86_64-server-release/images/jdk/bin/java -Xcomp -XX:TieredStopAtLevel=1 -XX:-Inline Hello.java"

# Before
  Time (mean ± σ):      1.510 s ±  0.007 s    [User: 1.386 s, System: 0.138 s]
  Range (min … max):    1.501 s …  1.526 s    10 runs

# After
  Time (mean ± σ):      1.504 s ±  0.007 s    [User: 1.378 s, System: 0.140 s]
  Range (min … max):    1.494 s …  1.516 s    10 runs

On Leyden, and well-trained javac runs, the impact is good:


$ hyperfine -w 10 -r 30 "build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar -XX:AOTCache=app.aot JavacBenchApp 50"

# Before
  Time (mean ± σ):     340.0 ms ±   4.4 ms    [User: 678.1 ms, System: 121.0 ms]
  Range (min … max):   329.8 ms … 351.0 ms    30 runs

# After
  Time (mean ± σ):     331.2 ms ±   4.0 ms    [User: 659.6 ms, System: 119.4 ms]
  Range (min … max):   324.3 ms … 338.7 ms    30 runs

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24933#issuecomment-2836097346
PR Comment: https://git.openjdk.org/jdk/pull/24933#issuecomment-2836099693


More information about the hotspot-compiler-dev mailing list