[crac] RFR: CRaC may exit before image dump is completed [v3]

Roman Marchenko rmarchenko at openjdk.org
Tue Mar 7 08:35:33 UTC 2023


On Mon, 6 Mar 2023 12:40:27 GMT, Roman Marchenko <rmarchenko at openjdk.org> wrote:

>> src/java.base/share/native/launcher/main.c line 120:
>> 
>>> 118:         pid = wait(&st);
>>> 119:         if (pid == g_child_pid && WIFEXITED(st)) {
>>> 120:             status = WEXITSTATUS(st);
>> 
>> Sorry for nit-picking, but now if the java was killed (`WIFEXITED == false`) we won't update status and will return `0`, which does not look correct. `restorewait` in this situation returns `1` [1], although better, also does not look perfect. Here I suggest be at least consistent with restorewait.
>> 
>> Or we can fix restorewait as well, indicating being killed by returning `128+signal`, as described in the bash manual [2]. How does it sound?
>> 
>>> When a command terminates on a fatal signal N, bash uses the value of 128+N as the exit status.
>> 
>> [1] https://github.com/openjdk/crac/blob/crac/src/java.base/unix/native/criuengine/criuengine.c#L306
>> [2] https://linux.die.net/man/1/bash
>
> @AntonKozlov 
> I personally would prefer to exit 0 on checkpoing to indicate the process is successfully finished. On the other hand I have no idea how can we recognize cases the child process is actually killed by someone else. So I agree with idea to return an appropriate code 128+N, as well as for restorewait, to keep it consistent.

To make things iterative, I suggest to implement wait_for_children() in the same way as restorewait() for now, and then create the next PR to make appropriate changes related to signal handling and returning exit codes.

-------------

PR: https://git.openjdk.org/crac/pull/46


More information about the crac-dev mailing list