Two ideas and a bug
Anton Kozlov
akozlov at azul.com
Fri Sep 6 13:56:49 UTC 2024
On 9/5/24 7:22 AM, Charles Oliver Nutter wrote:
> Baseline "hello world" startup of JRuby improves by 15-20x, which says as much about our fat boot cycle as it does about CRaC's outstanding restore performance.
Thank you for sharing such an awesome result!
> * Idea #1: CRaC checkpointing as a really slow JVM fork(2).
>
> JRuby has never been able to support forking the JVM because of challenges restoring the new process to full functionality: restarting GC and JIT threads, managing signals and file descriptors, etc. CRaC is already doing that in order to restore from a checkpoint!
>
> What if I wanted the checkpoint process to keep executing, but start up a child process by restoring the checkpoint I just acquired? Presto, super-slow forking!
Technically this is possible, and it looks reasonable. One of the primary use-cases for CRaC is to be able to quickly scale java instances, so forking in that way will be used for scale java processes on a single machine.
Right now if you export CRAC_CRIU_LEAVE_RUNNING=1 environment var, the original process will be kept alive. Then you should be able to restore from the (temporary?) image.
> * Idea #2: Incremental checkpointing
>
> I don't know if there's any technical limitation on acquiring a new checkpoint after restoring from an old checkpoint, but there's one practical limitation: you can't change the target directory for the new checkpoint.
>
> I would like to be able to incrementally improve a checkpoint, dumping the image to a new directory of my choosing each time. This would allow a checkpoint/restore chain similar to re-forking servers, which base later forks on the warmed-up children of previous forks. I could provide a baseline JRuby image that users could customize to their specific applications and load patterns.
>
> It would seem a checkpointRestore(Path) should be doable, yes?
CRaCCheckpointTo can be set on the restore, along a few other commands. But there is at least one bug, second checkpoint choose a wrong destination path. We'll investigate this in the bug https://bugs.openjdk.org/browse/JDK-8339662.
$JAVA -XX:CRaCCheckpointTo=img1 -DpreLoop Test.java
init
start
stage 1: 1
stage 1: 2
stage 1: 3
Sep 06, 2024 12:33:10 PM jdk.internal.crac.LoggerContainer info
INFO: Starting checkpoint
beforeCheckpoint
Killed
$JAVA -XX:CRaCRestoreFrom=img1 -XX:CRaCCheckpointTo=asdf
afterRestore
stage 1: 4
stage 1: 5
stage 1: 6
stage 1: 7
Sep 06, 2024 12:33:17 PM jdk.internal.crac.LoggerContainer info
INFO: Starting checkpoint
beforeCheckpoint
Error (criu/image.c:577): Can't open dir ubuntu: No such file or directory
Error (criu/crtools.c:237): Couldn't open image dir ubuntu
...
stage 1: 8
stage 1: 9
stage 1: 10
> * Possible bug: overwriting a compressed checkpoint with an uncompressed checkpoint produces a non-bootable image.
I can confirm this. It's very likely caused by us not cleaning the target directory, and the fact we detect the type of the image by the presence of the compressed part. So we'll track this under https://bugs.openjdk.org/browse/JDK-8339663.
> You should be able to reproduce this with a build of JRuby (https://github.com/jruby/jruby <https://github.com/jruby/jruby>) from the "crac" branch.
Thank you very much for the all feedback!
-- Anton
More information about the crac-dev
mailing list