Proposal for documentation and snapsafety

Wed Mar 15 14:36:33 UTC 2023

On 2/27/23 11:01, Radim Vansa wrote:
> While all JDK code can be eventually fixed in a similar way to SecureRandom I
> think that it's clear that not everything can be encapsulated. A good example
> are the environment variables, but also number of processors and many others
> [2].

It looks the problem here is not technical correctness, but still the user
experience, right?  I.e. by providing good documentation (javadoc == Java SE
API), we can provide a good level of specification of what to expect on
checkpoint and on restore.  But adhering to that specification is complicated
for users, as it is still a hard task to find uses of a method with a semantic
that has been changed in CRaC.  Is this understanding of the problem correct?

> I propose to tag any method/constructor that returns data that could be
> expected to stay constant in non-C/R app but often changes after restore, or
> an object that will need handling through a registered Resource, with
> @CracSensitive (3) annotation.  We will provide a tool that will report
> places that could call these methods, unless marked with @CracSafe
> annotation. This tool could work in a static way (scanning set of JARs,
> probably with a thin Maven plugin as well?) and as a javaagent, scanning
> classes as these get loaded.
>
> Naturally not all code invoking non-snap-safe methods is from user code, many
> cases come from the libraries. Alternative way to allow-list places calling
> @CracSensitive methods in places that cannot be changed directly would be
> provided, though eventually we aim at encouraging that the libraries adopt
> the @CracSafe internally.

While technically this is possible, there are a few drawbacks IMHO.  First, the
tool and annotations are interdependent, although the dependency of annotations
on the tool is implicit.  But anyway, annotations do not make any sense without
the tool checking them.  So, either the tool and annotations are somehow should
be completely external to the JDK, or both of them should be in the JDK.

But, I'm not sure the tool is the best approach.  That does not take advantage
of being able to track exact calls of the annotated methods before the
checkpoint and after restore.  For example, querying the number of processors
is fine if happens after the restore.  So the tool would need somehow to
distinguish calls of annotated methods before checkpoint (where previously
returned results may become obsolete) and those after restore, otherwise, there
will be some number of false positives, and those false positives would require
some way to silence them after consideration.  Also, even before the
checkpoint, having a call in the code does not mean that will be actually
called e.g. because of some specific configuration that disables detection of
the number of processes. So it seems without pretty complex static dataflow
analysis we'll have another source of false positives.

Have you considered taking advantage of actually running the program?  E.g.
recording stack traces for methods calls and reporting them on the checkpoint,
like in PR #43 [1].  Compared to the separate tool, the call recording reports
only calls that have happened, and only before the checkpoint.  The stack trace
provides some information about how the result will be used (although not
complete info on how the result of the method is going to be used).  The
implementation will probably be very simple, and by some convention, we can
agree on a way to exclude some stack traces from reporting, e.g. by having a
specific stack trace element.

Does the tool have an advantage over the recording of method calls and stack
traces?

[1] https://github.com/openjdk/crac/pull/43

Thanks,
Anton