Two ideas and a bug

Thu Sep 5 04:22:26 UTC 2024

Hello! I have been playing with running JRuby on the latest builds of Azul
Zulu + CRaC and I am very impressed!

Baseline "hello world" startup of JRuby improves by 15-20x, which says as
much about our fat boot cycle as it does about CRaC's outstanding restore
performance.

```
~/work/jruby $ jruby --checkpoint
Sep 05, 2024 4:16:39 AM jdk.internal.crac.LoggerContainer info
INFO: Starting checkpoint
Sep 05, 2024 4:16:39 AM jdk.internal.crac.LoggerContainer info
INFO: /home/headius/work/jruby/lib/jruby.jar is recorded as always
available on restore
CR: Checkpoint ...
Killed

~/work/jruby $ time jruby --restore -e "puts 'hello'"
hello

real 0m0.110s
user 0m0.111s
sys 0m0.049s

~/work/jruby $ time jruby -e "puts 'hello'"
hello

real 0m1.827s
user 0m5.377s
sys 0m0.183s
```

I was also impressed how quickly my two previous bugs were fixed after I
reported them to Anton Kozlov (command line argument quoting issues and
really slow compressed image restoration).

I have two weird ideas for using CRaC plus a possible bug to report.

* Idea #1: CRaC checkpointing as a really slow JVM fork(2).

JRuby has never been able to support forking the JVM because of challenges
restoring the new process to full functionality: restarting GC and JIT
threads, managing signals and file descriptors, etc. CRaC is already doing
that in order to restore from a checkpoint!

What if I wanted the checkpoint process to keep executing, but start up a
child process by restoring the checkpoint I just acquired? Presto,
super-slow forking!

Am I crazy?

* Idea #2: Incremental checkpointing

I don't know if there's any technical limitation on acquiring a new
checkpoint after restoring from an old checkpoint, but there's one
practical limitation: you can't change the target directory for the new
checkpoint.

I would like to be able to incrementally improve a checkpoint, dumping the
image to a new directory of my choosing each time. This would allow a
checkpoint/restore chain similar to re-forking servers, which base later
forks on the warmed-up children of previous forks. I could provide a
baseline JRuby image that users could customize to their specific
applications and load patterns.

It would seem a checkpointRestore(Path) should be doable, yes?

* Possible bug: overwriting a compressed checkpoint with an uncompressed
checkpoint produces a non-bootable image.

I ran into this while investigating checkpoint compression speed recently,
and Anton suggested I post it here.

```
~/work/jruby $ rm -rf .jruby.checkpoint/

~/work/jruby $ jruby --checkpoint -J-XX:+CRaCImageCompression
Sep 05, 2024 4:14:54 AM jdk.internal.crac.LoggerContainer info
INFO: Starting checkpoint
Sep 05, 2024 4:14:54 AM jdk.internal.crac.LoggerContainer info
INFO: /home/headius/work/jruby/lib/jruby.jar is recorded as always
available on restore
CR: Checkpoint ...
Killed

~/work/jruby $ jruby --restore -e "puts 'hello'"
hello

~/work/jruby $ jruby --checkpoint
Sep 05, 2024 4:15:25 AM jdk.internal.crac.LoggerContainer info
INFO: Starting checkpoint
Sep 05, 2024 4:15:25 AM jdk.internal.crac.LoggerContainer info
INFO: /home/headius/work/jruby/lib/jruby.jar is recorded as always
available on restore
CR: Checkpoint ...
Killed

~/work/jruby $ jruby --restore -e "puts 'hello'"
pie: 398386: Error (criu/pie/util-vdso.c:92): vdso: ELF header magic
mismatch
pie: 398386: Error (criu/pie/restorer.c:2194): Restorer fail 398386
Error (criu/cr-restore.c:2605): Restoring FAILED.
```

You should be able to reproduce this with a build of JRuby (
https://github.com/jruby/jruby) from the "crac" branch.

Thanks for your work!

*Charles Oliver Nutter*
*Architect and Technologist*
Headius Enterprises
https://www.headius.com
headius at headius.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/crac-dev/attachments/20240904/910d2879/attachment.htm>