[crac] RFR: CRaC may exit before image dump is completed
Anton Kozlov
akozlov at openjdk.org
Wed Feb 22 10:17:55 UTC 2023
On Tue, 21 Feb 2023 16:54:21 GMT, Dan Heidinga <heidinga at openjdk.org> wrote:
>> @DanHeidinga
>> Do you mean "we need to terminate children if the parent process is terminated"?
>> It seems like termination of a process with PID=1 running inside a container will stop a container run, so I guess children will be terminated also. I'd appreciate any reproducible scenario, in case I'm mistaken.
>
> Sorry, that was unclear as I'm fuzzy on the details.
>
> PID 1 is responsible for `wait()`ing on any spawned child processes to ensure they exit before exiting itself. It also needs to respond to `SIGTERM` by propagating it to all its child processes so they have a chance to gracefully shutdown before `SIGKILL` is sent. See [0] and [1]
>
> [0] https://petermalmgren.com/pid-1-child-processes-docker/
> [1] https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
Indeed. Since this process was supposed to be the java process, it should be a proxy a good proxy for the real java process. E.g. it should forward SIGTERM and other signals (SIGQUIT to print stack trace, etc). Although this proxy should not necessarily be a good init process.
We had the same problem in restore, this code probably will be useful https://github.com/openjdk/crac/blob/crac/src/java.base/unix/native/criuengine/criuengine.c#L263
-------------
PR: https://git.openjdk.org/crac/pull/46
More information about the crac-dev
mailing list