AppCDS / AOT thoughts based on CLI app experience

Mike Hearn mike at hydraulic.software
Fri Jun 3 08:57:45 UTC 2022


Hi Ioi,

We're using a JDK 17 with a few backports.

Unfortunately the default CDS archives goes missing during jlinking.
It's an easy fix. Actually, the product in question is a packaging
tool, it's not only for the JVM but it supports JVM apps quite well,
and  re-creating the CDS archive post-jlink is on the list of features
to add. It's packaged with itself so that'll fix it for our apps too.

> $ javac -J-XX:+AutoCreateSharedArchive -J-XX:SharedArchiveFile=javac.jsa
> HelloWorld.java
>
> javac.jsa will be automatically created if it doesn't exist, or if it's
> not compatible with the JVM (e.g., if you have upgraded to a newer JDK).

Yes, that's a nice improvement in usability. By the way, don't forget
-Xlog:cds=off because otherwise CDS likes to write lots of warnings to
the terminal (not a great look for a CLI app).

> The dynamic CDS dumping happens when the JVM exits.

Yes but it seems to slow down execution before that time as well.

Here's some timings for our app to parse CLI options, read the build
config, compute the task graph, print the available tasks, and reach
the end of main():

- With CDS off: ~0.8 seconds
- With CDS dumping active: ~1.25 seconds
- With CDS active: ~0.6 seconds

so the app appears to run ~50% slower when dynamic dumping is active
and that's not including the dump time itself. That's why I'm
suggesting doing it in the background as a totally separate
post-install step (with background forking required for platforms that
don't support or strongly discourage install scripts).

I get the impression this may not be expected? Is the JVM genuinely
doing extra work at runtime when dynamic dumping is active?

> Maybe we could have some sort of daemon that collects profiling data in
> the background, and update the archives when the application behavior is
> more understood.

Sure, the ideal would be something like "always dumping" mode in which
there's no slowdown. So you just give the JVM a directory (or >1
directory) and it caches internal structures, JITd code and persistent
heap snapshots there. Fire and forget. Then if you want to trade off
bandwidth vs first run time you can pre-populate the first directory
in the list with the results of a short run, like just getting to
first pixels for a desktop app or flag handling for a CLI app, and any
additional data generated goes into the second directory.

Bonus points if you find a way to share those directories over an NFS
mount - then you have a JIT server 'for free' in cloud deployments.

> The dynamic archive will be smaller, because it doesn't need to
> duplicate the built-in classes that are already in the static archive.

Right. That's true. I'd forgotten that you can combine them like that.
So we could ship a small static archive in the download that just
accelerates time-to-first-interaction, and generate a larger dump
client-side in the background that covers the whole execution.

> Will you have a similar problem if the JAR file of the application is
> maliciously modified?

If they're downloaded and stored in the home directory, yes, but, JARs
support code signing with per-file hashing so there's a way to fix
that built in to the platform. If they're just shipped as data files
in the app then it doesn't matter because they're signed and
tamperproofed using OS specific mechanisms.

All this is a bit theoretical. IntelliJ downloads unsigned JARs as
plugins and nobody seems to care. It's possible that's because it
doesn't request any special privileges so there's nothing to attack,
but in macOS things as basic as access to ~/Downloads is a permission
these days. Also JetBrains are moving to code signing their JARs
anyway. So ... yeah. Like I said. Hard to know how much to really care
about this. It might be one of those things that doesn't matter until
the day it does.

> What could be modified is the vtptr of archived MetaData objects. They
> usually point to somewhere near 0x800000000 (where the vtables are) but
> the attacker could modify them to point to arbitrary locations. I am not
> sure if this type of attack is easier than modifying the JAR files, or not.

Well, the issue here is a combination of where the files are generated
and performance. Again it's all a bit theoretical because the
performance discussion is rooted in the "disk access is slow" world
which isn't really true anymore. I've done some casual tests on my
laptop and did appear to see a real slowdown from this "hash whole
file on open" effect, but, it was a while ago and it wasn't rigorous
at all. It's also a total PITA to reproduce because there's no
explicit way to flush the cache, so you have to constantly re-copy
signed binaries over and over to force kernel cache misses. If I
explained how I measured this, Alexey Shipilev would yell at me :) so
I'll just leave it here as food for thought instead. And yeah it's
also not clear how much the uncached times matter these days. Years
ago it mattered a lot because people rebooted their machines often,
but Macs hibernate all the time and reboot only rarely so the caches
will remain warm.

I don't think treating AppCDS archives as hostile in the JVM itself
would be worth it. This is a Mac specific issue and that would be a
major constraint e.g. it'd mean you can't cache JITd native code in
the archives. Doesn't make sense. Better to tamperproof unbundled
archives in other ways, like computing ad-hoc signatures and stashing
the CodeDirectory in an xattr (if it ever matters).


More information about the leyden-dev mailing list