Call for Discussion: New Project: CRaC
Christine Flood
chf at redhat.com
Tue Jul 20 17:05:30 UTC 2021
We at Red Hat have been working on this problem as well and I think now is
a great time to sync our efforts.
Our current project, Jigawatts 1.21 is based on allowing the user to
specify precise checkpoints either by adding a method call, or manipulating
bytecodes via Byteman.
This code is separate from OpenJDK and will be distributed in it's own
Linux rpm.
The next phase will require some changes to OpenJDK, specifically we are
looking to do some optimizations at checkpoint time to improve
startup/runtime.
Here are two ideas.
1) Shrink the heap to just the live data size, this both guarantees that
there are no secrets hidden in garbage objects and minimizes restore time.
We can restore and immediately grow the heap.
2) Hot swap garbage collectors, this allows us to give fast startup and
fast runtime by using the epsilon collector on restore, eliminating the
space for card table and time for gc barriers. This will be particularly
useful for programs which wish to run fast small apps against an already
initialized data set.
So, my question to you is does it make sense to combine these into one
effort, or do we want to keep the projects separate for now? The efforts
are focusing in two different areas, specifically my understanding is that
CRAC wants to be able to checkpoint a JVM based on an external signal so at
any point in the runtime while Jigawatts is based more on user controlled
and JVM optimized checkpoints.
Christine
On Sun, Jul 18, 2021 at 10:50 AM Anton Kozlov <akozlov at azul.com> wrote:
> Hi,
>
> It's been a while since we presented Coordinated Restore at Checkpoint for
> the
> first time [0]. We are still committed to the idea and researching this
> topic.
>
> Java applications can avoid the long start-up and warm-up by saving the
> state
> of the Java runtime (snapshot, checkpoint). The saved state is then used
> to
> start instances fast (restored). But after the state was saved, the
> execution
> environment could change. Also, if multiple instances are started from the
> saved state simultaneously, they should obtain some uniqueness, and their
> executions should diverge at some point.
>
> We believe that the practical way to solve these problems is to make Java
> applications aware of when the state is saved and restored. Then an
> application will be able to handle environmental changes. The application
> will
> also be able to obtain uniqueness from the environment.
>
> The CRaC project aims to research Java API for coordination between
> application
> and runtime to save and restore the state. Runtime should support multiple
> ways to save the state: virtual machine snapshot, container snapshot, CRIU
> project on Linux, etc. We hope to come with an API that is general enough
> for
> any underlying mechanism. We also plan to explore safety checks in the
> API and
> runtime, which prevent saving the state if it may not be restored or work
> correctly after the restore.
>
> I propose myself as a Project Lead of the CRaC Project. If you're
> interested
> or want to be the committer, please drop me a message.
>
> A fork of JDK [1] would be a starting point of this project.
>
> Thanks,
> Anton
>
> [0]
> https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html
> [1] https://github.com/CRaC/jdk
>
>
More information about the discuss
mailing list