Snapsafety of core library classes

Thu May 19 12:09:09 UTC 2022

Hi,

I wonder if anybody has thought about how snapsafety for the core
library classes should be implemented in CRaC? By "snapsafety" I mean
correct and secure operation after restoring a JVM process which was
previously checkpointed and possibly cloned.

The first question is about deciding which classes can be considered
snapsafe? Naively any class whose objects hold some state will be
affected by snapshotting and cloning. For simple classes like String
or Integer we know that their objects are constant and cloning them
doesn't do any harm. Objects of other classes might however contain
more sensitive state like caches, unique identifiers, certificates,
encryption keys etc. which shouldn't be cloned or which become invalid
after restore.

By looking at the current CRaC repository [1] I can see that some
classes (e.g. sun.security.provider.SecureRandom or
sun.security.provider.NativePRNG.RandomIO) directly implement
j.i.c.JDKResource in order to make them snapsafe. But all the classes
which do so, are non-public. This means that snapsafety is currently a
"hidden", implicit feature of some classes in the core library (i.e.
if I create a new j.s.SecureRandom object, I can not know if it will
be snapsafe or not).

Do we want to make snapsafety an undocumented, implicit feature or do
we want to explicitly call it out in the JavaDoc, e.g. by forcing
classes which want to be snapsafe to implement javax.crac.Resource
(similar to implementing Serializable)?

I think both approaches have their pros and cons. If we make
snapsafety an explicit feature, we tell users that the corresponding
classes will behave correctly on snapshot and restore events. But what
about all the other classes in the core libraries. Are they all
snapsafe or snapunsafe by default?

If we make snapsafety an implicit feature it would become an
"implementation detail". This means we could have JDKs which are
snapsafe while other are not. It also means we could make older JDK
version snapsafe which would not be possible with the explicit model
because it is impossible to retrofit classes in older releases to
implement new interfaces.

@Dan: I remember you've mentioned that you've experimented with CRiU
in OpenJ9 as well. I'd be specifically interested about the core
library changes you had to do in order to make the JDK snapsafe. I
took a look at the OpenJ9 snapshot branch [2] , but couldn't find and
library changes there at all? Could you please share more details on
this topic if possible?

What are your thoughts on this issue?

Best regards,
Volker

[1]  https://github.com/openjdk/crac/compare/crac?expand=1#diff-b7061481
[2] https://github.com/eclipse-openj9/openj9/compare/snapshot#diff-54ac925d