[crac] RFR: PID adjustment on checkpoint [v10]
Anton Kozlov
akozlov at openjdk.org
Thu Jun 29 16:16:41 UTC 2023
On Wed, 28 Jun 2023 08:54:41 GMT, Roman Marchenko <rmarchenko at openjdk.org> wrote:
>> On restore, there might be PID value conflicts because of small PID values, if it was checkpoint'ed in a container. Therefore, when checkpointing in a container, we need to move PID value for new processes to a particular value to avoid conflicts on restore.
>>
>> See https://github.com/CRaC/example-lambda/blob/master/checkpoint.cmd.sh#L8 for example.
>>
>> This PR contains implemented functionality similar to the example above, making this work out of the box. By default, if checkpointing, PID is adjusted only if Java's PID is 1 that means Java is run in a container. To adjust PID manually for a checkpoint'ed process, `-XX:CRaCMinPid=<value>` option should be used along with `CRaCCheckpointTo`. Min `CRaCMinPid` value is 1, max `CRaCMinPid` value is `UINT_MAX`, but it is actually limited by OS's pid_max.
>>
>> There are the following possible scenarios for CRaC running in a container:
>>
>> // getpid CRaCMinPid | set_last_pid fork
>> // ------------------------------------------------
>> // 1 - | yes (default) yes
>> // 1 1 | no yes
>> // 1 >1 | yes yes
>> // >1 - | no no
>> // >1 <=getpid | no no
>> // >1 getpid< | yes yes
>
> Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision:
>
> Added FIXME for further steps
src/java.base/share/native/launcher/main.c line 213:
> 211: static void spin_last_pid(int pid) {
> 212: const int MaxSpinCount = pid < 1000 ? 1000 : pid;
> 213: for (int child = fork(), prev = 0, cnt = MaxSpinCount; child < pid; child = fork(), --cnt) {
Since waitpid is called only if `child < pid`, does this mean the last child that satisfy pid requirement is left unwaited?
-------------
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1245104030
More information about the crac-dev
mailing list