Portability of checkpoints?

Fri Oct 15 19:52:42 UTC 2021

How portable should CRaC checkpoints be?

When we were looking at checkpoint / restore in OpenJ9, one of the issues
we ran into early on was related to the portability of the checkpoints.
The use case was checkpointing an application server during a CI build and
then restoring it multiple times - basically speeding up the deployments by
shifting the work to CI system.

The implication of this approach is that a checkpoint created on one
machine may not be valid on another due to changes in the target
architecture in addition to changes in the environment.  It would be good
if we could surface a list of the things that will need to be changed in
the jvm and in the class libraries to address this.

I see a number of places in the CRaC code that have
implemented jdk.crac.Resource to add hooks to address environment changes.
I don't see a corresponding set of changes for the JVM itself though.

As an example, in OpenJ9 we added a commandline option to tell the jit to
generate more conservative code - to jit code as though running on an older
architecture so that the code was applicable across a greater set of target
machines.  Does Hotspot have similar options already or do we need to
pursue adding them as part of this project?

The discussion in [1] covers some of the background on determining default
processor features and [2] is a list of differences between
creation/restore environments that will need to be addressed for
portability.

Looking forward to hearing others thoughts on this,

--Dan

[1] https://github.com/eclipse-openj9/openj9/issues/7966
[2] https://github.com/eclipse-openj9/openj9/issues/12484