RFR: 8368465: [leyden] Improve precompiler method selection code [v3]

Wed Dec 3 13:58:47 UTC 2025

On Wed, 3 Dec 2025 13:27:39 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Actually... Now I see we generate, and thus use substantially A2 code! This also aligns with performance data: we have way fewer C1 compilations with this patch.
>> 
>> 
>> # ==== Baseline
>> 
>> Create:
>>   [4.593s][info][precompile] Precompilation for level 1 finished (94 successful out of 94 total)
>>   [4.604s][info][precompile] Precompilation for level 2 finished (131 successful out of 131 total)
>>   [4.814s][info][precompile] Precompilation for level 2 finished (1852 successful out of 1852 total)
>>   [6.035s][info][precompile] Precompilation for level 4 finished (1660 successful out of 1660 total)
>>   [4.589s][info][precompile] Precompilation for level 5 finished (1660 successful out of 1660 total)
>> 
>> Use:
>>   Tier1 {speed: 42838.159 bytes/s; standard:  0.014 s, 582 bytes, 135 methods; ...}
>>   Tier2 {speed: 210303.802 bytes/s; standard:  0.311 s, 63857 bytes, 817 methods; ...}
>>   Tier3 {speed: 134013.414 bytes/s; standard:  0.035 s, 4685 bytes, 245 methods; ...}
>>   Tier4 {speed: 69205.374 bytes/s; standard:  0.051 s, 3225 bytes, 13 methods; ...}
>>   AOT Code T1 {speed: 297580.645 bytes/s; standard:  0.001 s, 369 bytes, 94 methods; ...}
>>   AOT Code T2 {speed: 5654043.587 bytes/s; standard:  0.042 s, 237861 bytes, 1969 methods; ...}
>>   AOT Code T4 {speed: 25219362.296 bytes/s; standard:  0.029 s, 737408 bytes, 927 methods; ...}
>>   AOT Code T5 {speed: 30793594.418 bytes/s; standard:  0.048 s, 1474270 bytes, 1658 methods; ...}
>> 
>> 
>> # ==== Patched
>> 
>> Create:
>>   [3.984s][info][precompile] Precompilation for level 1 finished (311 successful out of 311 total)
>>   [4.382s][info][precompile] Precompilation for level 2 finished (2752 successful out of 2752 total)
>>   [4.383s][info][precompile] Precompilation for level 3 finished (0 successful out of 0 total)
>>   [5.392s][info][precompile] Precompilation for level 4 finished (1641 successful out of 1641 total)
>>   [3.972s][info][precompile] Precompilation for level 5 finished (1641 successful out of 1641 total)
>> 
>> Use:
>>   Tier1 {speed:  0.000 bytes/s; standard:  0.000 s, 0 bytes, 0 methods; ...
>>   Tier2 {speed: 579987.470 bytes/s; standard:  0.026 s, 15526 bytes, 44 methods; ...
>>   Tier3 {speed: 181499.273 bytes/s; standard:  0.026 s, 4761 bytes, 254 methods; ...
>>   Tier4 {speed: 77265.133 bytes/s; standard:  0.027 s, 2087 bytes, 12 methods; ...
>>   AOT Code T1 {speed: 432360.583 bytes/s; standard:  0.002 s, 942 bytes, 228 methods; ...
>>   AOT Code T2 {speed: 6664604.248 bytes/s; standard:  0.042...
>
> LOL, I think I found the performance bug in the original code that I fixed by accident. `MTD::highest_level()` includes the cases when method is inlined. For T2/T3 code, it would return `4` if we ended up inlining that method into T4 code. Which would fail the inclusion check for A2 compilation, even though we did have a legit top-level T2/T3 compile. I would say the fact we have inlined T2/T3 in _some_ T4 should _not_ disqualify A2 compilation. 
> 
> This change alone gives the same kind of performance boost as my patch:
> 
> 
> diff --git a/src/hotspot/share/compiler/precompiler.cpp b/src/hotspot/share/compiler/precompiler.cpp
> index 04f95857a63..8a5da803b04 100644
> --- a/src/hotspot/share/compiler/precompiler.cpp
> +++ b/src/hotspot/share/compiler/precompiler.cpp
> @@ -84,7 +84,7 @@ class PrecompileIterator : StackObj {
>  
>    static int compile_id(Method* m, int level) {
>      MethodTrainingData* mtd = m->method_holder()->is_loaded() ? MethodTrainingData::find(methodHandle(Thread::current(), m)) : nullptr;
> -    if (mtd != nullptr && mtd->highest_level() == level) {
> +    if (mtd != nullptr && mtd->highest_top_level() == level) {
>        CompileTrainingData* ctd = mtd->last_toplevel_compile(level);
>        if (ctd != nullptr) {
>          return ctd->compile_id();
> 
> 
> This thing is confusing, and yet another reason why I this PR looks more understandable: it very explicitly checks `MTD::highest_top_level()` when deciding whether to accept the method. It does not do this with this bug, and also does not implicitly participate in filtering by `compile_id() < INT_MAX`.

Reverted the sorting back to compile IDs instead of counters/size, as it does not affect performance, really.

-------------

PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2585220837