RFR: 8355769: Optimize nmethod dependency recording
Aleksey Shipilev
shade at openjdk.org
Tue Apr 29 08:48:20 UTC 2025
On Mon, 28 Apr 2025 18:01:42 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK.
>
> This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost.
>
> Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!).
>
> Additional testing:
> - [x] Ad-hoc benchmarks
> - [x] Linux x86_64 server fastdebug, `all`
> - [x] Linux AArch64 server fastdebug, `all`
A crude way to show impact on this in mainline: compile lots of simple methods.
$ taskset -c 0-7 hyperfine -w 3 -r 10 \
"build/linux-x86_64-server-release/images/jdk/bin/java -Xcomp -XX:TieredStopAtLevel=1 -XX:-Inline Hello.java"
# Before
Time (mean ± σ): 1.510 s ± 0.007 s [User: 1.386 s, System: 0.138 s]
Range (min … max): 1.501 s … 1.526 s 10 runs
# After
Time (mean ± σ): 1.504 s ± 0.007 s [User: 1.378 s, System: 0.140 s]
Range (min … max): 1.494 s … 1.516 s 10 runs
On Leyden, and well-trained javac runs, the impact is good:
$ hyperfine -w 10 -r 30 "build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar -XX:AOTCache=app.aot JavacBenchApp 50"
# Before
Time (mean ± σ): 340.0 ms ± 4.4 ms [User: 678.1 ms, System: 121.0 ms]
Range (min … max): 329.8 ms … 351.0 ms 30 runs
# After
Time (mean ± σ): 331.2 ms ± 4.0 ms [User: 659.6 ms, System: 119.4 ms]
Range (min … max): 324.3 ms … 338.7 ms 30 runs
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24933#issuecomment-2836097346
PR Comment: https://git.openjdk.org/jdk/pull/24933#issuecomment-2836099693
More information about the hotspot-compiler-dev
mailing list