[crac] RFR: PID adjustment on checkpoint [v8]

Sergey Nazarkin snazarki at openjdk.org
Tue Jun 27 12:09:32 UTC 2023

On Tue, 27 Jun 2023 11:45:56 GMT, Roman Marchenko <rmarchenko at openjdk.org> wrote:

>> On restore, there might be PID value conflicts because of small PID values, if it was checkpoint'ed in a container. Therefore, when checkpointing in a container, we need to move PID value for new processes to a particular value to avoid conflicts on restore.
>> See https://github.com/CRaC/example-lambda/blob/master/checkpoint.cmd.sh#L8 for example.
>> This PR contains implemented functionality similar to the example above, making this work out of the box. By default, if checkpointing, PID is adjusted only if Java's PID is 1 that means Java is run in a container. To adjust PID manually for a checkpoint'ed process, `-XX:CRaCMinPid=<value>` option should be used along with `CRaCCheckpointTo`. Min `CRaCMinPid` value is 1, max `CRaCMinPid` value is `UINT_MAX`, but it is actually limited by OS's pid_max.
> Roman Marchenko has updated the pull request incrementally with two additional commits since the last revision:
>  - Fixing review comments
>  - Revert "Now CracMinPid option must be set explicitly to adjust PID"
>    This reverts commit b3d66800d6ea441fb86498fdbb229400747eb44f.

Changes requested by snazarki (no project role).

src/java.base/share/native/launcher/main.c line 122:

> 120:         const int len = strlen(checkpoint_arg);
> 121:         if (0 == strncmp(arg, checkpoint_arg, len)) {
> 122:             crac_min_pid = atoi(arg + len);

atoi is not recommended to use anymore as it returns 0 on error.
"It is recommended to instead use the strtol() and        strtoul() family of functions in new programs."

src/java.base/share/native/launcher/main.c line 195:

> 193:     }
> 194:     const char *last_pid_filename = "/proc/sys/kernel/ns_last_pid";
> 195:     const int last_pid_file = open(last_pid_filename, O_WRONLY|O_CREAT|O_TRUNC, 0666);

O_CREAT looks redundant.
And this file requires special capability for the process. Shouldn't we address this in the doc?

src/java.base/share/native/launcher/main.c line 200:

> 198:     }
> 199:     int res = 0;
> 200:     if (0 > write(last_pid_file, buf, len)) {

I'd compare with len, just to handle all  "write" return values


PR Review: https://git.openjdk.org/crac/pull/86#pullrequestreview-1500668309
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1243612548
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1243617425
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1243620761

More information about the crac-dev mailing list