[crac] RFR: CRaC may exit before image dump is completed

Anton Kozlov akozlov at openjdk.org
Wed Feb 22 10:17:55 UTC 2023


On Tue, 21 Feb 2023 16:54:21 GMT, Dan Heidinga <heidinga at openjdk.org> wrote:

>> @DanHeidinga 
>> Do you mean "we need to terminate children if the parent process is terminated"?
>> It seems like termination of a process with PID=1 running inside a container will stop a container run, so I guess children will be terminated also. I'd appreciate any reproducible scenario, in case I'm mistaken.
>
> Sorry, that was unclear as I'm fuzzy on the details.
> 
> PID 1 is responsible for `wait()`ing on any spawned child processes to ensure they exit before exiting itself.  It also needs to respond to `SIGTERM` by propagating it to all its child processes so they have a chance to gracefully shutdown before `SIGKILL` is sent.  See [0] and [1]
> 
> [0] https://petermalmgren.com/pid-1-child-processes-docker/
> [1] https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/

Indeed. Since this process was supposed to be the java process, it should be a proxy a good proxy for the real java process. E.g. it should forward SIGTERM and other signals (SIGQUIT to print stack trace, etc). Although this proxy should not necessarily be a good init process.

We had the same problem in restore, this code probably will be useful https://github.com/openjdk/crac/blob/crac/src/java.base/unix/native/criuengine/criuengine.c#L263

-------------

PR: https://git.openjdk.org/crac/pull/46


More information about the crac-dev mailing list