CRaC + maven-daemon: An experience report

Fri Jun 10 13:29:18 UTC 2022

On 6/8/22 16:19, Dan Heidinga wrote:
>>> Snapsafety
>>
>> I still struggle to understand what is it. Is it a property of the code
>> (e.g.  if you use these classes, you are safe w.r.t. checkpoint and
>> restore and don't need to coordinate explicitly)? Or is it a property of
>> the state (the state can be safely checkpoint and restore -- what is
>> safe in this case)?
>>
>
> This is exactly the point Ashu's making in the document.  As much as I
> think we would all like snapsafety to be a static property of the
> source code so we could analyze it easily with some static analysis,
> it's unfortunately more complicated than a static property.
>

Hence my question :) Let's say that to be a property of the object
state.  Then a class has a property if all objects of the class have the
property in all possible states.  I don't see any other way for
correctness and secureness to be defined for a class, other than
providing Resource implementation on a per-class basis, and taking care
of class functionality and internal invariants.  That is, no mark or a
predicate on the code or the Resource implementation can imply real
safety -- that will always remain a non-formal property that should be
aligned with the surrounding context in the class.  For example, adding
another field and its initializaion can change the class from safe to
unsafe.  Such change is hard to correlate with necessary changes in
Resource implementation.

> There's a temporal aspect to it - when the checkpoint is taken affects
> the safety of the operation.  When the snapshot is taken determines
> what would need to be fixed up (and much of that is based on
> application specific invariants).
>
> The execution model on restore [0] also impacts the snapsafety.  As
> Ashu says, using the checkpoint to create an initialized base image
> has a different concept of "safety" than migrating a computation from
> one host to another.  Different pieces of state will need to be
> modified in each case and different invariants will hold (or be
> broken).

Indeed. Is the formal property worth pursuing then?  This was going to
be a language aid for app developers to annotate safe parts of their
programs, and for us to annotate parts of JDK.  While we can attempt to
annotate JDK correctly and fully, we cannot control how the language
feature will be used by users.  And for them, a better annotation
mechanism or a programming model (like reactive programming) may exist.
How about letting users decide how and when to annotate their programs,
and concentrate on JDK needs and how JDK is used by applications first,
as we understand these better?  For example, what's missing in the JDK
so the app won't need changing at all?  And what parts of the app are
absolutely necessary to change.  Is it possible for JDK to provide a set
of utilities to ease those changes?

> The .NET community took an interesting approach in their "Native AOT"
> story for "trimming" applications [1] that may be reusable for
> snapsaftey - they added warnings for certain operations that are
> incompatible with trimming (dead code elimination) and then require
> library authors to annotate methods that do generate the warnings.
> The annotations bubble up the call chain to the public apis and then
> library consumers can determine whether to call such apis or not.
>
> Building on this idea, if methods and classes are correctly annotated
> (with what annotations?  tbd) it may be possible to do some analysis
> when the checkpoint is created to determine whether the current state
> is "snapsafe" or not.  This is not so much a static property that can
> be statically analyzed, but one that must be checked when taking the
> checkpoint as it may require walking stacks (currently executing
> methods), examining loaded classes, heap walks(?), etc.

Now it's possible to create a runtime check for the object state safety,
that is to create Resource's beforeCheckpoint. An unsafe object may
always throw an Exception. Won't this be even more flexible? This
relates a lot to Snapsafety of core library classes [1], I'll reply
there.

Thanks for bringing more context,
-- Anton

[1] https://mail.openjdk.java.net/pipermail/crac-dev/2022-May/000222.html