Call for Discussion: New Project: CRaC

Fri Jul 23 07:38:15 UTC 2021

On 7/21/21 4:27 PM, Andrew Dinn wrote:
> 
> Well, that's how it works at present given the current language spec and behaviour of the JVM, runtime, middleware and apps that are all designed with the expectation of operating in a single continuous dynamic runtime. But the it does not necessarily have to remain that way.
> 
> One option for Java that would accommodate the static case might be to factor out static state initialization into computation of state that it is acceptable to precompute vs state that needs either to be initialized because it was not precomputed or, if it was already precomputed, re-initialized at runtime. There is also the opportunity to determine when it gets re-initialized -- e.g. before the program starts running or at the point where the program needs to make use of it.
> ...
> Well, yes, currently build time init requires a complex analysis of static init code. However, you would not necessarily need to have a static analysis if the language, runtime, middleware and apps were provided with a mechanism to define what can be pre-computed vs runtime computed vs re-computed.

New annotations sound like a substantial change in the language first.  I don't
sure I completely understand re-compute, and why is it needed if you have a
runtime initialization.  It's either equivalent to the callbacks, which are
rather simple and provide a way to change parts of the state.  Or it is capable
to automatically track dependencies from the re-computed execution, potentially
creating two "worlds": before and after recompute.  For example, what happens
if there is a singleton object, referenced from a static field and it is
swapped during re-computation?  What happends with another instances that have
seen the previous singleton?  Or if the singleton is referenced from a field of
an instance that is referenced from a static field, is it the same?

> It would be interesting to consider how a restart of a frozen JVM state might also profit from a similar mechanism. There is clearly the potential to recompute existing (static) class state when an app is restarted, just as with a static app startup. However, there may also be a need to refresh instance state.

If there would be a magic mechanism that automatically decides what needs to be
changed before run time, it could be re-used for, roughly speaking, for
automatic generation of CRaC API callbacks.  But, it should not invalidate too
much, eleminating the benefit of having the complete saved state that is ready
to run.  Ability to manually write the code to prepare for checkpoint and to
update the state after restore provide better control over the state, and the
behavior and performance in the run time is clearer for a user.

Interesting, it seems possible to express static java app start-up with CRaC,
like

	doPrecompute();     // generated by javac
	jdk.crac.Core.checkpointRestore(); // also handles re-compute?
	doRuntimeCompute(); // generated by javac
	main(argc, argv);   // some computatations probably are re-computed lazily

If we have a magic checkpoint/restore mechanism that optimizes the state and
compiles future executions, we'll have a static app.  But with CRaC, the
`checkpointRestore` call as well as updpating the state can happen after the
main method has started and can capture some of actual runtime behavior.

> It seems clear that for some apps it will be easy for them to correct their own internal app state using callbacks. However, it is not clear that JDK runtime state or JVM state will always be correct at restart of a frozen app. Certainly not if you transplant the app to a host with a different hardware, OS or process environment.
> 
> So, I think there is a need to look into how we can make the JDK and JVM play ball here and preferably in much the same way as we need to look at making it play ball with a static compile model.

In CRaC, JVM and JDK are involed in the saving and restoring the state now.
They will very likely remain so with external request to save the state (e.g.
VM snapshot).  The footprint of the required changes is rather small, I assume
much smaller than would be required for the static image.  JDK library has a
few resources that needs releasing on checkpoint and acquire on restore.  It
does not differ much in this sense from an application or framework.  JVM
synchronizes own state with the checkpoint, also to run safety checks for
resources.  Such checks and the implementation are intended to provide some
independence from the process environment.

In a proper implementation of CRaC, JVM and JDK should be correct if restored
on the same CPU and the same operating system.  This could be relaxed into one
or another way.

In theory, leaving JNI aside, it's possible to imagine a fully abstract
checkpoint/restore mechanism, that saves the runtime state in an abstract form,
that can be restored on another CPU and OS.  But it's is not a goal for the
CRaC project.

But more practical mechanism would sacrifice generality for ability to use
existing generated code.  For CRaC we especially hope to reuse JIT compiled
code.  Handling minor differences between CPU featres on the same arch is what
we likely need to do in the Project.  For example, this can be done by using a
conservative set before the checkpoint and switching to the full set after the
restore.  Template interpreter and JIT code could be optionally regenerated,
likely while the old versions are still executed.  For the JIT code, this does
not seem hard.

Thanks,
Anton