RFR: 8368465: [leyden] Improve precompiler method selection code [v3]
Aleksey Shipilev
shade at openjdk.org
Wed Dec 3 13:58:47 UTC 2025
On Wed, 3 Dec 2025 13:27:39 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> Actually... Now I see we generate, and thus use substantially A2 code! This also aligns with performance data: we have way fewer C1 compilations with this patch.
>>
>>
>> # ==== Baseline
>>
>> Create:
>> [4.593s][info][precompile] Precompilation for level 1 finished (94 successful out of 94 total)
>> [4.604s][info][precompile] Precompilation for level 2 finished (131 successful out of 131 total)
>> [4.814s][info][precompile] Precompilation for level 2 finished (1852 successful out of 1852 total)
>> [6.035s][info][precompile] Precompilation for level 4 finished (1660 successful out of 1660 total)
>> [4.589s][info][precompile] Precompilation for level 5 finished (1660 successful out of 1660 total)
>>
>> Use:
>> Tier1 {speed: 42838.159 bytes/s; standard: 0.014 s, 582 bytes, 135 methods; ...}
>> Tier2 {speed: 210303.802 bytes/s; standard: 0.311 s, 63857 bytes, 817 methods; ...}
>> Tier3 {speed: 134013.414 bytes/s; standard: 0.035 s, 4685 bytes, 245 methods; ...}
>> Tier4 {speed: 69205.374 bytes/s; standard: 0.051 s, 3225 bytes, 13 methods; ...}
>> AOT Code T1 {speed: 297580.645 bytes/s; standard: 0.001 s, 369 bytes, 94 methods; ...}
>> AOT Code T2 {speed: 5654043.587 bytes/s; standard: 0.042 s, 237861 bytes, 1969 methods; ...}
>> AOT Code T4 {speed: 25219362.296 bytes/s; standard: 0.029 s, 737408 bytes, 927 methods; ...}
>> AOT Code T5 {speed: 30793594.418 bytes/s; standard: 0.048 s, 1474270 bytes, 1658 methods; ...}
>>
>>
>> # ==== Patched
>>
>> Create:
>> [3.984s][info][precompile] Precompilation for level 1 finished (311 successful out of 311 total)
>> [4.382s][info][precompile] Precompilation for level 2 finished (2752 successful out of 2752 total)
>> [4.383s][info][precompile] Precompilation for level 3 finished (0 successful out of 0 total)
>> [5.392s][info][precompile] Precompilation for level 4 finished (1641 successful out of 1641 total)
>> [3.972s][info][precompile] Precompilation for level 5 finished (1641 successful out of 1641 total)
>>
>> Use:
>> Tier1 {speed: 0.000 bytes/s; standard: 0.000 s, 0 bytes, 0 methods; ...
>> Tier2 {speed: 579987.470 bytes/s; standard: 0.026 s, 15526 bytes, 44 methods; ...
>> Tier3 {speed: 181499.273 bytes/s; standard: 0.026 s, 4761 bytes, 254 methods; ...
>> Tier4 {speed: 77265.133 bytes/s; standard: 0.027 s, 2087 bytes, 12 methods; ...
>> AOT Code T1 {speed: 432360.583 bytes/s; standard: 0.002 s, 942 bytes, 228 methods; ...
>> AOT Code T2 {speed: 6664604.248 bytes/s; standard: 0.042...
>
> LOL, I think I found the performance bug in the original code that I fixed by accident. `MTD::highest_level()` includes the cases when method is inlined. For T2/T3 code, it would return `4` if we ended up inlining that method into T4 code. Which would fail the inclusion check for A2 compilation, even though we did have a legit top-level T2/T3 compile. I would say the fact we have inlined T2/T3 in _some_ T4 should _not_ disqualify A2 compilation.
>
> This change alone gives the same kind of performance boost as my patch:
>
>
> diff --git a/src/hotspot/share/compiler/precompiler.cpp b/src/hotspot/share/compiler/precompiler.cpp
> index 04f95857a63..8a5da803b04 100644
> --- a/src/hotspot/share/compiler/precompiler.cpp
> +++ b/src/hotspot/share/compiler/precompiler.cpp
> @@ -84,7 +84,7 @@ class PrecompileIterator : StackObj {
>
> static int compile_id(Method* m, int level) {
> MethodTrainingData* mtd = m->method_holder()->is_loaded() ? MethodTrainingData::find(methodHandle(Thread::current(), m)) : nullptr;
> - if (mtd != nullptr && mtd->highest_level() == level) {
> + if (mtd != nullptr && mtd->highest_top_level() == level) {
> CompileTrainingData* ctd = mtd->last_toplevel_compile(level);
> if (ctd != nullptr) {
> return ctd->compile_id();
>
>
> This thing is confusing, and yet another reason why I this PR looks more understandable: it very explicitly checks `MTD::highest_top_level()` when deciding whether to accept the method. It does not do this with this bug, and also does not implicitly participate in filtering by `compile_id() < INT_MAX`.
Reverted the sorting back to compile IDs instead of counters/size, as it does not affect performance, really.
-------------
PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2585220837
More information about the leyden-dev
mailing list