[crac] RFR: Portable C/R [v2]
Anton Kozlov
akozlov at openjdk.org
Fri Jun 7 16:08:42 UTC 2024
On Tue, 4 Jun 2024 17:20:59 GMT, Timofei Pushkin <duke at openjdk.org> wrote:
>> Implements a proof-of-concept "portable mode" for CRaC: a checkpoint-restore mechanism that does not rely on platform-dependent tools like CRIU instead saving VM state in terms of the Java specification (with some HotSpot specifics) — this allows to restore the saved state on machines with different CPU architecture and OS. A demo is available [here](https://github.com/TimPushkin/portable-crac-demo).
>>
>> Expected downsides compared to the traditional CRaC are restrictions on platform-dependent code usage (e.g. at the moment of checkpoint no native methods can be executing, off-heap memory obtained via `sun.misc.Unsafe` should be released) and somewhat slower restoration speeds (because platform-dependent state, including JIT-compiled code, should be re-created). In the future, Project Leyden may help with the latter.
>>
>> The mechanism is implemented as an internal part of HotSpot, it gets activated when an empty `CREngine` VM option is passed (i.e. `-XX:CREngine=""`, this is a temporary solution). Main implementation details are described in [this doc](https://github.com/TimPushkin/crac/blob/portable-cr/trimmed/doc/portable-cr.md).
>>
>> Since this is a proof-of-concept implementation, it currently lacks some important features. E.g. at the moment some early-initialized classes are not restored, most of JDK classes have not yet been properly adapted, checkpointing via `jcmd` is not fully supported, additional tests and optimizations are needed.
>
> Timofei Pushkin has updated the pull request incrementally with one additional commit since the last revision:
>
> Update full name
Hi all,
I've participated as an advisor in this work, and I also think the R&D that @TimPushkin did is very impressive.
The portable, or Java-level, CRaC be used in many use-cases, like providing an implementation on non-linux platforms, help even more with debugging CRaC applciation, or as a fallback if restore is not possible on the current platform (e.g. if CPU architectures between checkpoint and restore mismatch). But among those, I belive there are two notable cases.
First, it provides zero-effort java-level optimization framework. Many frameworks which target start-up optimizations opt to heavy build time optimizations to serialize their anticipated state to simple java programs recreating that state. Java-level CRaC can serialize that on the platform level. As with CRaC in general, the state is defined explicitly by Java program execution rather than more cumbersome generation of the java bytecode.
Another case, Java-level CRaC makes it easier to use Java tools for operations and transformations on the image. In the current state, the image format for the heap is .hprof, which makes it trivial to peek into the image and its state. And, such Java checkpoint can then be further optimized with AOT compilations, e.g. generation of the JIT code on the time of checkpoint (assuming multiple architectures), or more likely, feed java checkpoint to Leyden for the optimization(s).
The work looks great, but indeed it's too premature to be integrated to the main crac branch. I'm going to create `crac-portable` branch and integrate the PR there.
-------------
PR Comment: https://git.openjdk.org/crac/pull/155#issuecomment-2155130049
More information about the crac-dev
mailing list