RFR: 8368465: [leyden] Improve precompiler method selection code [v3]

Wed Dec 3 13:32:33 UTC 2025

On Wed, 3 Dec 2025 12:46:31 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> May be your change reduced number of AOT compiled nmethod in cache which allow faster processing.
>> Please run with `-Xlog:aot+codecache+init=debug -XX:+CITime` for production run to see how many AOT nmethods in AOT cache and how many were loaded/used.
>
> Actually... Now I see we generate, and thus use substantially A2 code! This also aligns with performance data: we have way fewer C1 compilations with this patch.
> 
> 
> # ==== Baseline
> 
> Create:
>   [4.593s][info][precompile] Precompilation for level 1 finished (94 successful out of 94 total)
>   [4.604s][info][precompile] Precompilation for level 2 finished (131 successful out of 131 total)
>   [4.814s][info][precompile] Precompilation for level 2 finished (1852 successful out of 1852 total)
>   [6.035s][info][precompile] Precompilation for level 4 finished (1660 successful out of 1660 total)
>   [4.589s][info][precompile] Precompilation for level 5 finished (1660 successful out of 1660 total)
> 
> Use:
>   Tier1 {speed: 42838.159 bytes/s; standard:  0.014 s, 582 bytes, 135 methods; ...}
>   Tier2 {speed: 210303.802 bytes/s; standard:  0.311 s, 63857 bytes, 817 methods; ...}
>   Tier3 {speed: 134013.414 bytes/s; standard:  0.035 s, 4685 bytes, 245 methods; ...}
>   Tier4 {speed: 69205.374 bytes/s; standard:  0.051 s, 3225 bytes, 13 methods; ...}
>   AOT Code T1 {speed: 297580.645 bytes/s; standard:  0.001 s, 369 bytes, 94 methods; ...}
>   AOT Code T2 {speed: 5654043.587 bytes/s; standard:  0.042 s, 237861 bytes, 1969 methods; ...}
>   AOT Code T4 {speed: 25219362.296 bytes/s; standard:  0.029 s, 737408 bytes, 927 methods; ...}
>   AOT Code T5 {speed: 30793594.418 bytes/s; standard:  0.048 s, 1474270 bytes, 1658 methods; ...}
> 
> 
> # ==== Patched
> 
> Create:
>   [3.984s][info][precompile] Precompilation for level 1 finished (311 successful out of 311 total)
>   [4.382s][info][precompile] Precompilation for level 2 finished (2752 successful out of 2752 total)
>   [4.383s][info][precompile] Precompilation for level 3 finished (0 successful out of 0 total)
>   [5.392s][info][precompile] Precompilation for level 4 finished (1641 successful out of 1641 total)
>   [3.972s][info][precompile] Precompilation for level 5 finished (1641 successful out of 1641 total)
> 
> Use:
>   Tier1 {speed:  0.000 bytes/s; standard:  0.000 s, 0 bytes, 0 methods; ...
>   Tier2 {speed: 579987.470 bytes/s; standard:  0.026 s, 15526 bytes, 44 methods; ...
>   Tier3 {speed: 181499.273 bytes/s; standard:  0.026 s, 4761 bytes, 254 methods; ...
>   Tier4 {speed: 77265.133 bytes/s; standard:  0.027 s, 2087 bytes, 12 methods; ...
>   AOT Code T1 {speed: 432360.583 bytes/s; standard:  0.002 s, 942 bytes, 228 methods; ...
>   AOT Code T2 {speed: 6664604.248 bytes/s; standard:  0.042 s, 281287 bytes, 2735 methods; ...
>   AOT Code T4 {speed: 26296881.658 bytes/s...

LOL, I think I found the performance bug in the original code that I fixed by accident. `MTD::highest_level()` includes the cases when method is inlined. For T3 code, it would return `4` if we ended up inlining that method into T4 code. Which would fail the inclusion check for A2 compilation, even though we did have a legit top-level T3 compile. I would say the fact we have inlined T3 in _some_ T4 should _not_ disqualify A2 compilation. 

This change alone gives the same kind of performance boost as my patch:

diff --git a/src/hotspot/share/compiler/precompiler.cpp b/src/hotspot/share/compiler/precompiler.cpp
index 04f95857a63..8a5da803b04 100644
--- a/src/hotspot/share/compiler/precompiler.cpp
+++ b/src/hotspot/share/compiler/precompiler.cpp
@@ -84,7 +84,7 @@ class PrecompileIterator : StackObj {
 
   static int compile_id(Method* m, int level) {
     MethodTrainingData* mtd = m->method_holder()->is_loaded() ? MethodTrainingData::find(methodHandle(Thread::current(), m)) : nullptr;
-    if (mtd != nullptr && mtd->highest_level() == level) {
+    if (mtd != nullptr && mtd->highest_top_level() == level) {
       CompileTrainingData* ctd = mtd->last_toplevel_compile(level);
       if (ctd != nullptr) {
         return ctd->compile_id();


This thing is confusing, and yet another reason why I this PR looks more understandable: it very explicitly checks `MTD::highest_top_level()` when deciding whether to accept the method. It does not do this with this bug, and also does not implicitly participate in filtering by `compile_id() < INT_MAX`.

-------------

PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2585116344