Coordinated Restore at Checkpoint: A new project for start-up optimization?
gil at azul.com
Thu Sep 10 18:25:00 UTC 2020
We would like to open a discussion about a new project focused on
"Coordinated Restore at Checkpoint".
A possible relevant project name might be Tubthumpting .
Over the years, we [at Azul] have tinkered with various ways to improve java
start-up time and warmup behavior for different use cases for such
improvements. One of the interesting focus areas has been the "starting of
a new instance" of an application that has already run instances using identical
code, a similar expected profile, and potentially a similar initialization
sequence in the past. This is a common scenario in modern application
deployments, when e.g. rolling out new code in continuous deployment
environment, and when e.g. elastically changing instance counts in e.g.
Checkpoint/Restore technologies have evolved in various forms over the past
few years, and are available in the multiple forms, including e.g. CRIU 
and Docker Checkpoint & Restore . While Checkpoint/Restore capabilities
have been shown to work across a wide range of applications for e.g. live
process or application migration, there are various challenges present for
their generic application for new instance deployment. Many of these
challenges have to do with the need to deal with a checkpointed state that may
not be validly reproducible when restoring multiple instances from the same
This is where Coordinate Restore at Checkpoint (CRaC) comes in. At a high
level, CRaC aims to systemically address these challenges by facilitating
explicit and intentional coordination between checkpointed applications and
a checkpointing mechanism. Such coordination will allow applications to
proactively discard problematic state ahead of checkpointing and to
reestablish needed state upon restoration. [e.g. closing open file
descriptors ahead of a checkpoint, and recreating and binding them after a
Coordination is a powerful enabler in this space. Contrary to the approaches
attempting transparent, uncoordinated checkpoint/restore, CRaC's approach to
the date has focused on assisting with the detection of situations that would
prevent a successful checkpoint, and simply refusing to checkpoint if such
conditions are identified. This approach leaves it up to the application
frameworks and the applications themselves to remedy the situation during
development, and before attempting actual deployment (or simply accept
non-CRaC startup times since a restorable checkpoint state will not be
In the Java arena, we aim to create a generic CRaC API that would allow
applications and/or application frameworks to coordinate with an arbitrary
checkpoint/restore mechanism, without being tied to a specific
implementation or to the operational means by which checkpointing and
restoration is achieved. Such an API would allow application frameworks
(e.g. Tomcat, Quarkus, MicroNaut, etc.) to perform the needed coordination
in a portable way, which would not require coding that is specific to a
checkpoint/restore mechanism. E.g. the same Tomcat CRaC coordination code
would be able to properly coordinate with a generic Linux CRIU utility, with
Docker Checkpoint & Restore, or with future OpenJDK implementations that may
support checkpoint/restore functionality directly or via the use of
libraries or system services.
Our hope is to start a project that will focus on specifying a CRaC API, and
will provide at least one CRaC-supporting checkpoint/restore OpenJDK
implementation with the hope of eventual upstream inclusion in a future
OpenJDK version via associated JEPs. We would potentially want to include
the API in a future Java SE specification as well.
In reality, we expect that more than one checkpoint/restore mechanism may be
supported, as we have already identified at least two probable modes of
operation that would be useful for OpenJDK:
- We have prototyped  a JDK-driven, modified-CRIU  based
checkpoint/restore implementation that leverages on-demand paging during
startup to deliver very promising start times for e.g microservices
running on Quarkus, Micronaut, and Tomcat, and reaching "full speed"
condition in sub-50-msec times.
- We anticipate external-to-the-JDK checkpoint/restore implementations such
as Docker Checkpoint & Restore  and potential possible support within
orchestration frameworks (such as future Kubernetes versions) will drive
a need for non-Java-specific means of coordinating restoration from
checkpointed conditions, and that in such environments JDKs will likely
wish to provide external controls (such jcmd or other APIs) that would
deal with coordination, but leave the actual checkpointing and restore
work to external entities.
Below are short summaries of:
- CRaC API concepts
- What a prototype OpenJDK implementation looks like
- Preliminary uses of CRaC API in some application frameworks
- Some promising preliminary results
What do you think? Please chime in.
P.S. Anton Kozlov has done the vast majority of the technical work on this
so far, and will be joining the discussion here.
CRaC API, conceptually
The high-level concepts of a CRaC API as we see it thus far include:
- Application code (a "resource") can register its interest in coordinating
with checkpoint/restore operations.
- When a checkpoint operation attempt is initiated, and before a checkpoint
is actually taken, all registered "resources" will be notified that a
checkpoint is being attempted via e.g. a beforeCheckpoint() call.
- A JDK may (and likely will) refuse to complete a checkpoint attempt if it
encounters any application state that it does not know how to checkpoint
or restore. E.g. a JDK may (and likely will) refuse to complete a
checkpoint attempt if any file descriptors that are not private to the
JDK itself are open after all registered resources have been notified
about the coming checkpoint attempt.
- When a restore operation occurs, all registered resources will be notified
via e.g. an afterRestore() callback.
- Upon being notified of a coming checkpoint, a resource is responsible for
destroying any state that may prevent the capturing of a checkpoint (e.g.
close any objects that it is responsible and that may keep open file
descriptors), as well as for capturing whatever information it may need
in order to continue successfully after a restore (e.g. the knowledge of
what needs to be "opened" before a restore is complete).
- A resource may cause a checkpoint attempt to fail by throwing an exception
- Upon being notified that a restore has occurred, a resource is responsible
for any required restoration or recreation of the state that it destroyed
before the checkpoint occurred. [e.g. opening, binding, listening, and
possibly selecting on server ports that were closed for the checkpoint].
Note that although restoration is not functionally required in some cases,
it may still be beneficial for faster functional startup upon restoration.
E.g. outbound connections in a connection pool may not have to be
reconnected, as normal connection failure handling will likely deal with
their re-establishment in any case. However, initiating such reconnection
upon restore will likely improve functional startup time.
- A resource may indicate that a restore attempt should fail by throwing an
exception when notified.
Prototype JDK implementation
The prototype JDK implementation  implements Coordinated Checkpoint and
Restore using a modified version of CRIU. A snapshot image of the JDK process
created at an arbitrary point of time, the image is later used to start a copy
of the process that is identical to the original one.
Hotspot change highlights:
- Adds a Coordinated Checkpoint and Restore implementation for Linux
- the checkpoint is performed in a JVM safepoint
- currently depends on being able to reuse the checkpointed process pid.
[not a problem in containers]
- Adds a jcmd command for initiating Checkpoint (does not yet pass error
information on failure)
- Enforces no java user-visible file or socket resources are allowed at the
checkpoint time. Exception message indicates the problematic resource
- Changes in PerfMemory (/tmp/hsperfdata<user>/<pid>) to work across multiple
- Performs GC on checkpoint and zeros unused heap memory to minimize
JDK change highlights:
- a jdk.crac API providing Checkpoint and Restore notifications
- uses of the jdk.crac API within the JDK:
- support in sun.nio.ch.EPollSelectorImpl to handle epoll and pipe
- jar file handling by the JDK
- support in java.net.PlainSocketImpl and sun.nio.ch.FileDispatcherImpl
to handle internal socket used for preclose
Preliminary uses of CRaC API in some application frameworks
AKA: What modifying common application frameworks to use a proposed CRaC API
successfully on a prototype OpenJDK implementation looks like.
The CRaC API was used to create modified versions of Quarkus , Micrnoaut
 and Tomcat  (used by Spring Boot in our examples). The amount of code
changes required has been surprisingly small.
All three frameworks successfully coordinate checkpoint and restore
operations with the prototype JDK without requiring any changes to the
example code that runs on the framework. It is hoped that a large majority
of applications that run on such frameworks would not require any CRaC API
use, and CRaC awareness will only be needed at the framework and potentially
at the library levels in most cases.
Promising Preliminary Results
The current prototype has demonstrated <50msec startup times  for fully
warmed microservice examples running on modified Spring Boot, Quarkus, and
The examples demonstrate fully-JIT'ed performance out of the box: the
immediate throughout of these <50msec starts matches the throughput achieved
by a normal OpenJDK start only after the latter has fully warmed up, and
after it had executed >10,000 example operations at significantly slower
More information about the discuss