Call for Discussion: New Project: Leyden

Thu Jun 4 10:11:38 UTC 2020

Graal derives almost all of its space/time reductions relative to the
> dynamic JVM from being able to discard great swathes of class metadata and
> class initialization. A closed world assumption is critical to the validity
> and comprehensive applicability of both those dietary regimes.
>

The argument makes sense. Still, AppCDS makes images bigger and also start
faster. A CRUI of the JVM after reaching a startup checkpoint would be
bigger still. So I wonder if there's a risk of conflating space and time
optimisations more than is strictly necessary?

The big win for developer iteration is coming only from startup times.
Image size doesn't matter at all. Likewise for apps like IDEs and build
tools as developer machines usually have plenty of free disk space. Is that
also true for cloud/containers? Sure, less disk space usage is nice and
obviously caches mean size and speed are interrelated, but a stock JVM with
jlink gets disk size down pretty far already and native code is quite
verbose. I found that for even toy "real apps" the native images could
quickly get as large as a regular JDK, simply because x86 machine code for
every possible code path vs compressed bytecode added up quickly.

Demand for instant startup isn't only driven by demand for dynamic scaling
but also avoiding complex deployments, where newly started servers need to
perform a self-driving warm up on test data as otherwise users who hit a
cold server get request timeouts. We had some nasty outages back in the day
at Google caused by the massive speed difference between cold and warm
servers; native images get rid of that as a concern completely (at the cost
of a more complex build/deploy loop). If people had to pick smaller
containers size-wise *or* fast startup and weren't allowed both, I bet
nearly all would go for the fast startup.

If your hypothesis about the benefits primarily coming from discarding data
is correct (vs the more direct linkage forms the different AOT compiler can
use), is it strictly necessary to discard the data or just not load it on
the hot paths? How much research is done on which benefits users are most
sensitive to and to what extent it's the closed world optimisations that
are driving them, vs all the pre-initialisation and heap snapshotting?

Losing open world is quite painful from a frameworks perspective. A lot of
projects that are startup-time sensitive will be unable to go closed world
easily or at all, because they are heavily plugin based. Many projects will
be forced to pick between convenience and footprint. If it's truly
unavoidable then so be it.