[crac] RFR: PID adjustment on checkpoint [v10]

Anton Kozlov akozlov at openjdk.org
Thu Jun 29 16:16:41 UTC 2023


On Wed, 28 Jun 2023 08:54:41 GMT, Roman Marchenko <rmarchenko at openjdk.org> wrote:

>> On restore, there might be PID value conflicts because of small PID values, if it was checkpoint'ed in a container. Therefore, when checkpointing in a container, we need to move PID value for new processes to a particular value to avoid conflicts on restore.
>> 
>> See https://github.com/CRaC/example-lambda/blob/master/checkpoint.cmd.sh#L8 for example.
>> 
>> This PR contains implemented functionality similar to the example above, making this work out of the box. By default, if checkpointing, PID is adjusted only if Java's PID is 1 that means Java is run in a container. To adjust PID manually for a checkpoint'ed process, `-XX:CRaCMinPid=<value>` option should be used along with `CRaCCheckpointTo`. Min `CRaCMinPid` value is 1, max `CRaCMinPid` value is `UINT_MAX`, but it is actually limited by OS's pid_max.
>> 
>> There are the following possible scenarios for CRaC running in a container:
>> 
>>     // getpid   CRaCMinPid  |   set_last_pid      fork
>>     // ------------------------------------------------
>>     //   1         -        |    yes (default)    yes
>>     //   1         1        |    no               yes
>>     //   1        >1        |    yes              yes
>>     //   >1        -        |    no               no
>>     //   >1      <=getpid   |    no               no
>>     //   >1       getpid<   |    yes              yes
>
> Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Added  FIXME for further steps

src/java.base/share/native/launcher/main.c line 213:

> 211: static void spin_last_pid(int pid) {
> 212:     const int MaxSpinCount = pid < 1000 ? 1000 : pid;
> 213:     for (int child = fork(), prev = 0, cnt = MaxSpinCount; child < pid; child = fork(), --cnt) {

Since waitpid is called only if `child < pid`, does this mean the last child that satisfy pid requirement is left unwaited?

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1245104030


More information about the crac-dev mailing list