[crac] RFR: Portable C/R [v2]

Timofei Pushkin duke at openjdk.org
Fri Jun 7 08:02:38 UTC 2024


On Tue, 4 Jun 2024 17:20:59 GMT, Timofei Pushkin <duke at openjdk.org> wrote:

>> Implements a proof-of-concept "portable mode" for CRaC: a checkpoint-restore mechanism that does not rely on platform-dependent tools like CRIU instead saving VM state in terms of the Java specification (with some HotSpot specifics) — this allows to restore the saved state on machines with different CPU architecture and OS. A demo is available [here](https://github.com/TimPushkin/portable-crac-demo).
>> 
>> Expected downsides compared to the traditional CRaC are restrictions on platform-dependent code usage (e.g. at the moment of checkpoint no native methods can be executing, off-heap memory obtained via `sun.misc.Unsafe` should be released) and somewhat slower restoration speeds (because platform-dependent state, including JIT-compiled code, should be re-created). In the future, Project Leyden may help with the latter.
>> 
>> The mechanism is implemented as an internal part of HotSpot, it gets activated when an empty `CREngine` VM option is passed (i.e. `-XX:CREngine=""`, this is a temporary solution). Main implementation details are described in [this doc](https://github.com/TimPushkin/crac/blob/portable-cr/trimmed/doc/portable-cr.md).
>> 
>> Since this is a proof-of-concept implementation, it currently lacks some important features. E.g. at the moment some early-initialized classes are not restored, most of JDK classes have not yet been properly adapted, checkpointing via `jcmd` is not fully supported, additional tests and optimizations are needed.
>
> Timofei Pushkin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update full name

Hello Radim,

The ultimate motivation is to speed up application boot while keeping the level of portability expected from Java platform. As a specific practical example where this is useful, users will not have to update the image once shared libraries on which JVM depends get updated in the target environment — deployment of the image gets simpler.

Yes, this method will be slower than CRaC on CRIU (or other technologies operating on the level of OS processes) but it should be faster than restarting the program, at least in some scenarios. The greatest increase in speed is expected when on boot the app performs a lot of work in terms of Java: loading classes, making heavy use of reflection, performing computations and saving them in Java fields — the results of that work is in most cases platform-independent and thus will not have to be repeated on restore.

Regarding JIT, currently it is indeed discarded but, in theory, it can be added to the portable image in the future and used when the source and the target environments are similar enough. There are also hopes that Project Leyden, which also aims to reduce start up time, can be somehow integrated with portable CRaC, but since Leyden is currently mostly theoretical I cannot really elaborate on that (I imagine applying Leyden's condensers to the portable image after it has been created to speed it up for a specific target platform which can differ from the current platform).

As of now, performance of the proposed method is not impressive because my current goal is to develop the main parts of the method and make them work correctly. Many possible optimizations are currently out of focus (e.g. restoration is performed on a single thread since it is easier to implement this way, but some of its parts can be performed concurrently). On Jetty example (which I chose because it appeared to be the least complex to restore, other CRaC examples require some more work) I currently don't see improvements in time compared to starting the app anew, but, firstly, I expect to somewhat improve on that later making the code more optimal, and secondly, Jetty is probably not the best scenario to demonstrate performance of portable restoration (e.g. I expect to get better results with Spring since it heavily relies on reflection, but Spring examples are more complex and this proof-of-concept cannot handle them yet).

This PR is to show the work being done in this direction, I'm not sure about how and when this can be integrated in CRaC since currently it is more of a research which is not yet anywhere near being production ready.

-------------

PR Comment: https://git.openjdk.org/crac/pull/155#issuecomment-2154303641


More information about the crac-dev mailing list