Call for Discussion: New Project: CRaC
Volker Simonis
volker.simonis at gmail.com
Wed Jul 21 07:56:23 UTC 2021
Hi Christine,
thanks for joining the discussion :)
Please find my further comments inline.
On Tue, Jul 20, 2021 at 7:06 PM Christine Flood <chf at redhat.com> wrote:
>
> We at Red Hat have been working on this problem as well and I think now is
> a great time to sync our efforts.
>
> Our current project, Jigawatts 1.21 is based on allowing the user to
> specify precise checkpoints either by adding a method call, or manipulating
> bytecodes via Byteman.
> This code is separate from OpenJDK and will be distributed in it's own
> Linux rpm.
>
> The next phase will require some changes to OpenJDK, specifically we are
> looking to do some optimizations at checkpoint time to improve
> startup/runtime.
>
> Here are two ideas.
>
> 1) Shrink the heap to just the live data size, this both guarantees that
> there are no secrets hidden in garbage objects and minimizes restore time.
>
> We can restore and immediately grow the heap.
>
> 2) Hot swap garbage collectors, this allows us to give fast startup and
> fast runtime by using the epsilon collector on restore, eliminating the
> space for card table and time for gc barriers. This will be particularly
> useful for programs which wish to run fast small apps against an already
> initialized data set.
>
> So, my question to you is does it make sense to combine these into one
> effort, or do we want to keep the projects separate for now? The efforts
> are focusing in two different areas, specifically my understanding is that
> CRAC wants to be able to checkpoint a JVM based on an external signal so at
> any point in the runtime while Jigawatts is based more on user controlled
> and JVM optimized checkpoints.
>From my point of view it makes sense to combine the efforts. I think
CRAC should explore different ideas and directions (see my previous
mail). One of them will be be how the JVM can implement and control
checkpointing functionality. That's what your Jigawatts project is
doing, but also what CRAC did in a POC based on CRIU.
The other direction CRAC should explore is how the JVM could react an
externally triggered checkpointing events.
Finally, I think we need a mechanism exposed through a standard API
which allows applications and frameworks to react on checkpointing
events no matter if these events are triggered internally, by the JVM
or externally. Such a mechanism is especially needed in situations
where an applications is not simply suspended and resumed but also
cloned several times after it was resumed (or resumed several times
from the same checkpointed state).
>
>
> Christine
>
>
>
>
>
> On Sun, Jul 18, 2021 at 10:50 AM Anton Kozlov <akozlov at azul.com> wrote:
>
> > Hi,
> >
> > It's been a while since we presented Coordinated Restore at Checkpoint for
> > the
> > first time [0]. We are still committed to the idea and researching this
> > topic.
> >
> > Java applications can avoid the long start-up and warm-up by saving the
> > state
> > of the Java runtime (snapshot, checkpoint). The saved state is then used
> > to
> > start instances fast (restored). But after the state was saved, the
> > execution
> > environment could change. Also, if multiple instances are started from the
> > saved state simultaneously, they should obtain some uniqueness, and their
> > executions should diverge at some point.
> >
> > We believe that the practical way to solve these problems is to make Java
> > applications aware of when the state is saved and restored. Then an
> > application will be able to handle environmental changes. The application
> > will
> > also be able to obtain uniqueness from the environment.
> >
> > The CRaC project aims to research Java API for coordination between
> > application
> > and runtime to save and restore the state. Runtime should support multiple
> > ways to save the state: virtual machine snapshot, container snapshot, CRIU
> > project on Linux, etc. We hope to come with an API that is general enough
> > for
> > any underlying mechanism. We also plan to explore safety checks in the
> > API and
> > runtime, which prevent saving the state if it may not be restored or work
> > correctly after the restore.
> >
> > I propose myself as a Project Lead of the CRaC Project. If you're
> > interested
> > or want to be the committer, please drop me a message.
> >
> > A fork of JDK [1] would be a starting point of this project.
> >
> > Thanks,
> > Anton
> >
> > [0]
> > https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html
> > [1] https://github.com/CRaC/jdk
> >
> >
More information about the discuss
mailing list