RFR: 8369735: [Leyden] AOT compiled methods have lower peak performance [v2]

Thu Oct 16 14:59:09 UTC 2025

On Thu, 16 Oct 2025 14:45:01 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This should get us back on track with peak performance.
>> 
>> Recently, I fixed the recent regression in compilation times: https://github.com/openjdk/leyden/commit/b8cfee49c6cb60225b13cbe3dc57a37975125385 -- caused by https://github.com/openjdk/leyden/commit/7b7648a4c9f67be509c6fccbcbc0502648388fdc. In doing so, I dropped the `is_initialized()` check from `notice_jit_observation()`, because I noticed it filtered out too many dependencies. So the net result was that our dependencies are now _overly conservative_, which means our JIT compilation times are great, but we are stuck in AP4 code without switching to A4. I instantiated the check back.
>> 
>> _That_ gets us back to original regression I have been fixing (https://github.com/openjdk/leyden/commit/7b7648a4c9f67be509c6fccbcbc0502648388fdc), so I took another look what goes wrong there. And I think I figured it out: most of the uncommon traps are from `invokedynamic` call-sites that get uncommon-trapped (at `Parse::can_not_compile_call_site`) in A4 code, I believe because LF invokers (like `j.l.invoke.Invokers$Holder`) are _not fully initialized_ in assembly phase! Oops. So I added the code that reports core JLI classes as fully initialized for A4 code, which I believe they are in production run. This seems to stop premature AP4 -> A4 switch in all workloads I tried.
>> 
>> Plus, I took Vladimir's patch to replace AP4 -> A4 code even when there are no clinit barriers. We reasoned it is a right thing to do, because AP4 is compiled without uncommon traps and other speculations. This whole ordeal shows that AP4 is indeed significantly slower than A4.
>> 
>> Plus, in Spring Boot Petclinic I noticed that we still JIT compiling quite a bit of code, which meant `SkipTier2IfPossible` opt-in that we recently did (https://github.com/openjdk/leyden/commit/3be99e0087c3588693dc7ce0f8d0a0b860ecbbd4) -- got warmup significantly worse there, as we waited for C2 compilations to complete. I took that block out again. We can do this separately, but the improvements this PR gives is on par with winning due to more reasonable tiered compilation.
>> 
>> Additional testing:
>>  - [x] Benchmarks, see comments
>>  - [x] Linux x86_64 server fastdebug, `runtime/cds`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Includes

Other benchmarks seem to respond well. We expect _some_ regression on pure startup tests, since we are not locked into AP4 anymore.

quarkus-getting-started (4 cores):

Run,Old CDS + AOT,New CDS + AOT
1,147,148
2,139,142
3,145,138
4,148,145
5,140,136
6,149,137
7,143,139
8,140,144
9,139,139
10,142,139
Geomean,143.15,140.65 (1.02x improvement)

micronaut-first-app (4 cores):

Run,Old CDS + AOT,New CDS + AOT
1,139,142
2,138,141
3,138,139
4,137,141
5,139,140
6,138,141
7,139,143
8,138,142
9,139,142
10,140,141
Geomean,138.50,141.20 (0.98x improvement)
Stdev,0.81,1.08

spring-boot-getting-started (4 cores):

Run,Old CDS + AOT,New CDS + AOT
1,204,195
2,203,197
3,204,197
4,206,198
5,204,197
6,204,198
7,204,198
8,203,197
9,205,198
10,206,195
Geomean,204.30,197.00 (1.04x improvement)
Stdev,1.00,1.10

spring-petclinic (4 cores):

Run,Old CDS + AOT,New CDS + AOT
1,1235,1167
2,1245,1168
3,1237,1162
4,1220,1187
5,1234,1164
6,1236,1176
7,1247,1157
8,1256,1172
9,1253,1176
10,1254,1172
Geomean,1241.65,1170.07 (1.06x improvement)
Stdev,10.73,8.07

helidon-quickstart-se:

Run,Old CDS + AOT,New CDS + AOT
1,99,99
2,101,98
3,102,98
4,99,98
5,100,100
6,99,99
7,100,101
8,101,101
9,100,97
10,98,98
Geomean,99.89,98.89 (1.01x improvement)

-------------

PR Comment: https://git.openjdk.org/leyden/pull/103#issuecomment-3411324895