<div dir="ltr"><div dir="ltr"><div>Replies below</div><div><br></div><div>On 06. 09. 24 15:56, Anton Kozlov wrote:</div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> Technically this is possible, and it looks reasonable. One of the <br>

> primary use-cases for CRaC is to be able to quickly scale java <br>

> instances, so forking in that way will be used for scale java <br>

> processes on a single machine.<br>

><br>

> Right now if you export CRAC_CRIU_LEAVE_RUNNING=1 environment var, the <br>

> original process will be kept alive. Then you should be able to <br>

> restore from the (temporary?) image.<br></blockquote><div><br></div><div>Very interesting! I will have to play with this a bit more. Are these env vars etc documented anywhere but in the actual code? </div><div><br></div><div>> I can confirm this. It's very likely caused by us not cleaning the<br>> target directory, and the fact we detect the type of the image by the<br>> presence of the compressed part. So we'll track this under<br>> <a href="https://bugs.openjdk.org/browse/JDK-8339663" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8339663</a>.<br></div><div><br></div><div>Great, thanks! I'll hold off on including compression in my CRaC blog series until I know this is fixed.</div><div><br></div><div>from Radim:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

The second restore would fail currently; CRIU will attempt to restore <br>

with the same PID/TIDs as the running instance and that will fail. I <br>

think that Anton experimented in the past with CRIU allowing to restore <br>

at different PIDs, and from Java POV this is mostly OK. But I think that <br>

this would require modifications in CRIU, some ugly code that would be <br>

dependent on GLIBC version - the thread IDs are stored somewhere on the <br>

beginning of stack.<br></blockquote><div><br></div><div>That does sound a bit nasty. Is there ongoing work to make this possible in CRIU? </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

However, if you'd be OK with restoring it in a new cgroups namespace, <br>

PID conflicts could be avoided.<br></blockquote><div><br></div><div>I am not familiar with cgroups but I will look into it. I suppose this is also a good example of why using Docker with process zero for each launch is a more flexible way to use CRaC. </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">There's one more aspect to this: with default configuration we use in <br>

CRaC the files in image directory are mmaped into memory</blockquote><div>... </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">eventually getting dependent on all the images, not only <br>

the last one.<br></blockquote><div>... <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Again, the solution exists: there can be a background thread in JVM <br>

concurrently copying the bits to a new chunk of memory and mremapping <br>

that into original place.</blockquote><div>... <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">normally this should be the job for CRIU but now it is <br>

something that runs inside JVM (so part of the JVM? parasite thread <br>

injected by CRIU? dealing with this externally through ptrace API?).<br></blockquote><div><br></div><div>I'd like to understand the boundaries here better, both technically and project-wise. What work is being done in CRIU to address things like pid relocation and memory-mapping?</div><div><br></div><div>The picture in my head of how I'd like it to behave would obviously have each new checkpoint override previous ones, so that it's based only on itself and not any externally-mapped memory. Or, perhaps, some way to reduce the footprint of incremental checkpoints so they know part of the image is already in a previous checkpoint? Presumably much of the data is being mmapped as read-only so could be reused? Incremental checkpoints as just an overlay on previous ones?</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

>><br>

>> It would seem a checkpointRestore(Path) should be doable, yes?<br><br>

I totally second the suggestion to have this in the API; however there <br>

must be a practical application for the second checkpoint given the <br>

problem above.<br></blockquote><div><br></div><div>I think this API would be valid even without repeated checkpoints. Statically configuring the checkpoint target via JVM flags is not ideal. My program should be able to participate in choosing the location of the image, so I can adjust that location based on usage and report back to the user exactly where it is once captured. </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Anyway, thanks for these ideas. I believe that it's important to keep a <br>

big picture of all the use-cases for CRaC in mind, rather than thinking <br>

just about quick microservice startup somewhere in the cloud. JRuby can <br>

definitely bring a different set of problems to the discussion; we just <br>

need to crac(k) them :)<br></blockquote><div><br></div><div>I predict great things here, even with the current limitations!</div><div><br></div><div>- Charlie</div></div></div>