injecting an AOT cache into a JRE+app "deployment artifact"

Tue Aug 19 03:11:37 UTC 2025

On Thu, Aug 14, 2025 at 11:11 PM John Rose <john.r.rose at oracle.com> wrote:
>
> In today’s meeting we discussed a tricky chicken-and-egg problem,
> which is adding an AOT cache into a deployment artifact, where
> the AOT cache came from a training from from (almost) the same
> deployment artifact.  (Almost, but not quite.)
>
> By “deployment artifact” I mean any organization of a JRE plus
> an application classpath (including JARs) plus any other
> dependencies.  It could be a JRE plus some command line
> arguments plus an assurance that the JAR files mentioned
> on the command line will always be available and won’t
> change.
>
> The Hermetic Java project aims at making a single executable
> file that contains the full “deployment artifact”, so it
> can serve as a crisp visualization of what I mean by
> “deployment artifact”.
>
> There are many ways to specify such an artifact, it seems
> to me, including many deployment and packaging facilities
> built to enable cloud computing.  I’ll let others add
> more details about that, if they wish.
>
> In a basic view, it is a JRE plus some app JARs
> (and maybe other libraries) plus some configuration
> information (often viewable as command line options),
> plus any other dependencies, including an optional
> AOT cache (or CDS archive).
>
> Now we add a Leyden principle about training runs,
> that a training run should be as similar as possible
> to the ultimate production run, in order to get an
> AOT cache that is tuned for application behavior
> that is typical during final production.
>
> This takes us to the following puzzle:  To make
> a training run, I need to put together a “deployment
> artifact” that represents, as accurately as possible,
> the actual app (with its JRE and configuration) that
> I intend to deploy “for real”.  But it must lack
> one thing, the AOT cache.  I’m making the training
> run to get that AOT cache.  But when I get it,
> I need to (somehow) retroactively inject the
> AOT cache back into my “deployment artifact”.
>
> The Leyden JEPs make this look pretty simple:
> Just add some more command line options to pull
> in the AOT cache.  And don’t change anything else!
>
> But if there is a complicated pipeline for
> application deployment, and/or a special
> “bundle format” (or even a unified executable)
> needed for deployment, then it seems harder
> to say, in a robust manner, how to tweak the
> “deployment artifact” one way (A) to get the AOT
> cache, and then how to tweak it the opposite
> way (B) by injecting the resulting AOT cache
> into the artifact itself.
>
> I’m making a fuss about this because, depending
> on the details of how much processing and packaging
> is required, it could turn out that those tweaks
> (A) and (B) might perturb the JVM version, JARs
> and/or configuration options enough so that,
> after all that work, the AOT cache does not
> “fit” into the resulting execution.  Instead,
> it detects a configuration mismatch, tragically
> due to its own injection into the final
> “deployment artifact”, and it “falls out”
> of the deployment run.  My concern is to
> make sure this doesn’t happen due to errors
> in packaging.
>
> Here’s an example of what might go wrong.
> Suppose we have a jlink-like command that
> builds a JVM (from sources) to match some
> configuration parameter.  Suppose that JVM
> has an internal version number which is a UID.
> Suppose we build a “deployment artifact” which
> contains such an ad hoc build of the JVM,
> and perform a training run, obtaining an
> AOT cache.  Now, we re-run our packaging
> workflow, this time with the AOT cache.
> Suppose we rebuild the VM (same sources)
> but we get a new ad hoc UID.  Now the
> AOT cache won’t match.  (…Unless we do
> some bug fixing, but I’m concerned about
> robustly getting the right answer without
> bug fixing.)
>
> Beyond the simple command line examples shown
> in the JEPs, I have one suggestion for how to make
> these sorts of things work in a reliable manner,
> and that is to bake Leyden-like workflows into
> jlink.
>
> The jlink command builds a jre, and it can also
> fold in application JARs and various configuration
> settings (AFAIK).  So we can focus on jlink as
> a venue for building compatible “deployment
> artifacts”, compatible for both the training
> and production, even though the production
> version has an AOT cache in it, and the
> training one does not (or has a little one).
>
> Adding jlink allows a workflow like this:
>
> (a) I run jlink to build a JRE with app JARs
> (b) it gives me DA0 “deployment artifact zero”
> (c) I make a training run using DA0 and get an AOT cache
> (d) I rerun jlink as in (a), except I add the AOT cache
> (e) it gives me DA1 (maybe I handed it DA0 also to edit)
> (f) I make many production runs using DA1
>
> By using jlink twice, in a coordinated manner, I am
> assured that, if my AOT cache ever fails to apply
> to a production run, that there is a bug in jlink.
> (Or I have made a production run with an incompatible
> configuration of hardware or GC or whatever, which
> is under my control.)
>
> Does this help?  It partially depends if users are
> willing to deploy with the help of jlink.  If not,
> then they are “on the hook” to make sure that the
> AOT cache does not “fall down” when they deploy
> for production.
>
> What about one-file formats, as with Hermetic?
> I think it’s tricker, because DA0 and DA1 are
> two distinct files.  If the AOT cache (in DA1)
> runs a checksum test expecting to see DA0, it
> might fall down when it sees the details of DA1.
> It all depends on how the checksum is organized.
>
> Anyway, this is as far as I’ve gotten today
> with this interesting chicken-and-egg problem.
> (Or maybe it’s a Heisenbug, if the presence
> of the observing AOT cache disrupts the expected
> observation?)
>
> I’d love to hear that I’m over-thinking things,
> and that deployment workflows are really not
> that tricky, and that adding training runs
> and AOT caches is straightforward.
>
> (If we try to micro-customize JREs, that adds
> a significant potential cause of AOT cache failure.
> I also worry about re-spinning a one-file artifact.
> Are those the only grounds for me to worry?  Maybe.)
>
> On the positive side, I think we want to make our
> deployment tools (jlink!) more Leyden-aware, so that
> users don’t have to get too creative in managing
> AOT caches.
>
> Thoughts?

John, thanks for the notes. After reading your notes, I recall (but
missed during the meeting discussion) there's one additional positive
aspect about hermetic Java that would help simplify embedding AOT
cache in the final image: With a single executable hermetic image
including the JDK/VM runtime, it ensures that the VM and modules must
be compatible with the AOT cache/CDS archive in the hermetic image, as
long as we make sure that the AOT cache/CDS archive is generated by
running with the same libjvm and modules linked into the hermetic
image. No related runtime check (e.g. checking archive header to
verify modules size) is needed with a hermetic image. That also
slightly helps the startup time.

Thanks!
Jiangli

>
> — John