Initialization code that never got trained
María Arias de Reyna Dominguez
mariasde at redhat.com
Wed Feb 4 09:14:14 UTC 2026
Hi!
On Tue, Feb 3, 2026 at 8:40 PM Vladimir Kozlov <vladimir.kozlov at oracle.com>
wrote:
> Thank you, María, for your report.
>
> > (for some reason, the log didn't say anything about any nmethod in
> the codecache)
>
> I just check latest premain build and it shows nmethods.
> What command liens you used?
>
Training: java -XX:+PrintCompilation
-agentpath:/..../libasyncProfiler.so=start,event=cpu,file=.....profile.html
-XX:AOTCacheOutput=...../sqpc-quarkus-uberjar-app.aot -Xcomp
-Xlog:aot+map=trace,aot+map+oops=trace:file=......-aot.map:none:filesize=0
-Xlog:class+load=info,aot+resolve*=trace,aot+codecache+exit=debug,aot=warning:file=......training.log:level,tags
-jar file.jar
Production: java -XX:+PrintCompilation
-agentpath:/..../libasyncProfiler.so=start,event=cpu,file=......profile.html
-XX:AOTCache=...../sqpc-quarkus-uberjar-app.aot
-Xlog:class+load=info,aot+resolve*=trace,aot+codecache+exit=debug,aot=warning:file=./.....aot.log:level,tags
-jar file.jar
(removed the paths)
> JDK26 said that "[warning][aot] The AOT cache was created by a
> different version or build of HotSpot" so I couldn't even use it on my
> experiment.
>
> What command lines you used for JDK 26 experiment?
>
Same.
>
> Thanks,
> Vladimir K
>
> On 2/3/26 3:14 AM, María Arias de Reyna Dominguez wrote:
> > Hi again!
> >
> > Comparing native and java was not as straight forward as I thought...
> > but I decided to just do an experiment: What would happen if I train
> > with "-Xcomp" and force compilation of everything? Would I get some
> > advantage?
> >
> > My hypothesis said yes. Reality had other ideas.
> >
> > This is a simple REST API over Quarkus that calls a database and returns
> > a select. I trained with "-Xcomp" and then run production without that
> > option. And compared production runs with what happens if I don't train
> > with XComp.
> > This was done on 2 cores with dedicated cpu cores.
> > But on my laptop, so other things running at the same time may have
> > interfered (like using IO or memory, who knows, slack is a beast). But I
> > run it four times and the results are always similar.
> >
> > JDK26 said that "[warning][aot] The AOT cache was created by a different
> > version or build of HotSpot" so I couldn't even use it on my experiment.
> > Premain (results from build
> > over 127bfc9b0dd122c78e702867a88e0847ec362e68) didn't throw that error.
> > Probably this is a bug, not a feature, but let's use it!
> >
> > Do we store more stuff on the cache with that option enabled? Yes, we
> > definitely do.
> >
> > image.png
> >
> > Do we have a faster start-up time with Xcomp enabled? No, we even have a
> > worse start-up time:
> >
> > image.png
> > I decided to take a look at the cache statistics in both Premain runs:
> >
> > With XComp on:
> > [debug][aot,codecache,exit] Adapters: total=725
> > [debug][aot,codecache,exit] Shared Blobs: total=0
> > [debug][aot,codecache,exit] C1 Blobs: total=0
> > [debug][aot,codecache,exit] C2 Blobs: total=0
> > [debug][aot,codecache,exit] AOT code cache size: 894352 bytes, max
> > entry's size: 2208 bytes
> > [info ][aot,codecache,exit] Wrote 725 AOT code entries to AOT Code Cache
> > Classes in AOT Cache: 12,603
> > -> KlassTrainingData: 7,101 (56.34%)
> > Objects in AOT Cache: 149,684
> > -> AOT-inited: 1,261 (0.84%)
> > -> java.lang.Class instances: 12,361 (8.26%)
> > -> java.lang.String instances: 46,320 (30.95%)
> > Methods in AOT Cache: 158,664
> > -> MethodCounters: 38,424 (24.22%)
> > -> MethodData: 33,347 (21.02%)
> > -> MethodTrainingData: 37,619 (23.71%)
> > -> CompileTrainingData:
> > -> Level 1: 552 (0.35%)
> > -> Level 2: 36 (0.02%)
> > -> Level 3: 24,737 (15.59%)
> > -> Level 4: 23,761 (14.98%)
> >
> >
> > Without XComp:
> > [debug][aot,codecache,exit] Adapters: total=724
> > [debug][aot,codecache,exit] Shared Blobs: total=0
> > [debug][aot,codecache,exit] C1 Blobs: total=0
> > [debug][aot,codecache,exit] C2 Blobs: total=0
> > [debug][aot,codecache,exit] AOT code cache size: 893136 bytes, max
> > entry's size: 2208 bytes
> > [info ][aot,codecache,exit] Wrote 724 AOT code entries to AOT Code Cache
> > Classes in AOT Cache: 12,465
> > -> KlassTrainingData: 2,693 (21.60%)
> > Objects in AOT Cache: 149,416
> > -> AOT-inited: 1,250 (0.84%)
> > -> java.lang.Class instances: 12,208 (8.17%)
> > -> java.lang.String instances: 46,458 (31.09%)
> > Methods in AOT Cache: 157,933
> > -> MethodCounters: 11,004 (6.97%)
> > -> MethodData: 7,311 (4.63%)
> > -> MethodTrainingData: 8,794 (5.57%)
> > -> CompileTrainingData:
> > -> Level 1: 1,249 (0.79%)
> > -> Level 2: 947 (0.60%)
> > -> Level 3: 4,784 (3.03%)
> > -> Level 4: 1,154 (0.73%)
> >
> > (for some reason, the log didn't say anything about any nmethod in the
> > codecache)
> >
> > Whatever that argument is doing, is not helping as I expected.
> >
> > We get many more TrainingData objects, and the CompileTrainingData is
> > done at a higher level. But it doesn't seem to speed up the application,
> > probably because we are busy loading things we are not really going to
> use?
> >
> > So, the conclusion is: don't bother. This looks like a dead end. María,
> > you should have trusted the process: the JVM knows better than you.
> >
> >
> > On Wed, Jan 7, 2026 at 9:18 AM María Arias de Reyna Dominguez
> > <mariasde at redhat.com <mailto:mariasde at redhat.com>> wrote:
> >
> > Hi!
> >
> > Thanks! I will try to take a closer look and see what is exactly
> > what is happening.
> >
> > Right now, a comparison on a Quarkus native vs Quarkus Leyden (J26
> > main latest) is close to six or seven times faster on the tests I
> > have done. But that may be test-dependent, so I have to dig further.
> >
> > On Sun, Jan 4, 2026 at 6:23 PM Dan Heidinga <dan.heidinga at oracle.com
> > <mailto:dan.heidinga at oracle.com>> wrote:
> >
> > Happy new year!
> >
> > > For example: a REST API. It has some initialization, port
> > opening, reading configurations,
> > > etc... that run only once. So the code will never be trained.
> But it always runs at startup,
> > > impacting the time to first response.
> >
> > Historically, JVMs have looked at run-once code - like the body
> > of <clinit> - as not being worth compiling as the return on the
> > investment in compile time is too low. There have always been
> > exceptions but even template style jits have avoided run once
> code.
> >
> > Can you quantify how much of the applications startup is spent
> > in these run-once methods?
> >
> > > So, how can I tell Leyden to please compile and cache those
> functions, even if they are
> > > going to be run just once, even if they are not optimized at
> all, even if those compilations
> > > can get discarded after a couple of seconds?
> >
> > Compiling the code isn’t enough. There’s a lot of work with
> > careful timing required to get the code ready for use before the
> > first invocation. If we miss that window, then the compiled
> > code is just overhead.
> >
> > For “expensive” or long running single use code, we may be able
> > to precompile with C1 and get out of the interpreter earlier at
> > the cost of some coordination overhead to ensure the methods are
> > installed immediately.
> >
> > I think we’d need to understand better where the time is being
> > spent to see why this run once code is slowing down startup.
> >
> > —Dan
> >
> > *From: *leyden-dev <leyden-dev-retn at openjdk.org <mailto:leyden-
> > dev-retn at openjdk.org>> on behalf of María Arias de Reyna
> > Dominguez <mariasde at redhat.com <mailto:mariasde at redhat.com>>
> > *Date: *Tuesday, December 30, 2025 at 4:13 AM
> > *To: *leyden-dev <leyden-dev at openjdk.org <mailto:leyden-
> > dev at openjdk.org>>
> > *Subject: *Initialization code that never got trained
> >
> > Happy New Year!
> >
> > I have been doing some experiments with Leyden and realized
> > something: there is some code at startup/initialization that
> > never gets optimized but is impacting on startup and warmup time.
> >
> > This was a realization while doing comparisons with native/
> > graalvm images of the same code.
> >
> > For example: a REST API. It has some initialization, port
> > opening, reading configurations, etc... that run only once. So
> > the code will never be trained. But it always runs at startup,
> > impacting the time to first response.
> >
> > Compared to a native image, the native image may not have it
> > optimized, but at least it is already compiled, not interpreted.
> > Therefore, the native image starts faster.
> >
> > So, how can I tell Leyden to please compile and cache those
> > functions, even if they are going to be run just once, even if
> > they are not optimized at all, even if those compilations can
> > get discarded after a couple of seconds?
> >
> > Or are we just going to assume that that code, which is
> > impacting startup time, doesn't need to be pre-compiled because
> > we are focusing only on optimizations made by the JVM on runtime?
> >
> > Kind regards,
> > María Arias de Reyna Domínguez
> > Senior Software Engineer
> > She / Her / Hers
> > ariasdereyna at redhat.com <mailto:ariasdereyna at redhat.com>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20260204/5caf0e39/attachment.htm>
More information about the leyden-dev
mailing list