[crac] RFR: Support repeated checkpoint and restore operations
Radim Vansa
duke at openjdk.org
Fri Apr 14 08:32:06 UTC 2023
On Thu, 13 Apr 2023 15:08:33 GMT, Anton Kozlov <akozlov at openjdk.org> wrote:
>> * VM option CRaCCheckpointTo is recognized when restoring the application (destination can be changed)
>> * The main problem for checkpoint after restore was old checkpoint image mmapped to files (CRaC-specific CRIU optimization for faster boot). Before performing checkpoint we transparently swap this with memory using anonymous mapping.
>
> src/hotspot/os/linux/os_linux.cpp line 6383:
>
>> 6381: bool ok = !_dry_run;
>> 6382:
>> 6383: remap_old_imagedir();
>
> VM was not bothered the way CREngine saved the memory content. The mmaping is an implementation detail of the CR mechnism.
>
> Have you considered switching off the mmaping in CRIU in this repeated checkpoint-restore sequence? Assuming we would be able communicate that to CREngine (in CRIU mmaping is an option).
>
> Semantically, this patch propopses to handle a mapping twice, once in CRIU with mmaping and another time in the VM. There are some benefits of doing everything in the VM and having better control over the process. So it would be cleaner to do a practically big part of the memory management in the VM and leaving bootstraping only to the CRIU.
I think that mmaping in CRIU is an important optimization that speeds up boot, so I did not want to force disabling that if you ever want to do the checkpoint again.
You're right that this mixes the abstractions and responsibilities. There's an alternative solution that would not require changes in the VM, but it's technically more complex: we could ptrace VM and replace the mapping for it externally (though I am not sure how exactly should we invoke syscall on behalf of the tracee - maybe a parasite code would be needed?). Since ptracing process needs elevated priviledges we should probably add this as a separate criu command that would be invoked by criuengine.
The advantage is clearer semantics and not relying on SIGSEGV handling here which some consider a sketchy practice. But it's a more complex solution.
-------------
PR Review Comment: https://git.openjdk.org/crac/pull/57#discussion_r1166487358
More information about the crac-dev
mailing list