Two ideas and a bug
Charles Oliver Nutter
headius at headius.com
Tue Sep 10 15:21:31 UTC 2024
Replies below
On 06. 09. 24 15:56, Anton Kozlov wrote:
> > Technically this is possible, and it looks reasonable. One of the
> > primary use-cases for CRaC is to be able to quickly scale java
> > instances, so forking in that way will be used for scale java
> > processes on a single machine.
> >
> > Right now if you export CRAC_CRIU_LEAVE_RUNNING=1 environment var, the
> > original process will be kept alive. Then you should be able to
> > restore from the (temporary?) image.
>
Very interesting! I will have to play with this a bit more. Are these env
vars etc documented anywhere but in the actual code?
> I can confirm this. It's very likely caused by us not cleaning the
> target directory, and the fact we detect the type of the image by the
> presence of the compressed part. So we'll track this under
> https://bugs.openjdk.org/browse/JDK-8339663.
Great, thanks! I'll hold off on including compression in my CRaC blog
series until I know this is fixed.
from Radim:
> The second restore would fail currently; CRIU will attempt to restore
> with the same PID/TIDs as the running instance and that will fail. I
> think that Anton experimented in the past with CRIU allowing to restore
> at different PIDs, and from Java POV this is mostly OK. But I think that
> this would require modifications in CRIU, some ugly code that would be
> dependent on GLIBC version - the thread IDs are stored somewhere on the
> beginning of stack.
>
That does sound a bit nasty. Is there ongoing work to make this possible in
CRIU?
However, if you'd be OK with restoring it in a new cgroups namespace,
> PID conflicts could be avoided.
>
I am not familiar with cgroups but I will look into it. I suppose this is
also a good example of why using Docker with process zero for each launch
is a more flexible way to use CRaC.
There's one more aspect to this: with default configuration we use in
> CRaC the files in image directory are mmaped into memory
...
> eventually getting dependent on all the images, not only
> the last one.
>
...
>
> Again, the solution exists: there can be a background thread in JVM
> concurrently copying the bits to a new chunk of memory and mremapping
> that into original place.
...
> normally this should be the job for CRIU but now it is
> something that runs inside JVM (so part of the JVM? parasite thread
> injected by CRIU? dealing with this externally through ptrace API?).
>
I'd like to understand the boundaries here better, both technically and
project-wise. What work is being done in CRIU to address things like pid
relocation and memory-mapping?
The picture in my head of how I'd like it to behave would obviously have
each new checkpoint override previous ones, so that it's based only on
itself and not any externally-mapped memory. Or, perhaps, some way to
reduce the footprint of incremental checkpoints so they know part of the
image is already in a previous checkpoint? Presumably much of the data is
being mmapped as read-only so could be reused? Incremental checkpoints as
just an overlay on previous ones?
>>
> >> It would seem a checkpointRestore(Path) should be doable, yes?
>
> I totally second the suggestion to have this in the API; however there
> must be a practical application for the second checkpoint given the
> problem above.
>
I think this API would be valid even without repeated checkpoints.
Statically configuring the checkpoint target via JVM flags is not ideal. My
program should be able to participate in choosing the location of the
image, so I can adjust that location based on usage and report back to the
user exactly where it is once captured.
Anyway, thanks for these ideas. I believe that it's important to keep a
> big picture of all the use-cases for CRaC in mind, rather than thinking
> just about quick microservice startup somewhere in the cloud. JRuby can
> definitely bring a different set of problems to the discussion; we just
> need to crac(k) them :)
>
I predict great things here, even with the current limitations!
- Charlie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/crac-dev/attachments/20240910/4bab172d/attachment-0001.htm>
More information about the crac-dev
mailing list