[crac] RFR: PID adjustment on checkpoint [v3]

Anton Kozlov akozlov at openjdk.org
Wed Jun 21 13:31:39 UTC 2023


On Wed, 21 Jun 2023 06:17:39 GMT, Roman Marchenko <rmarchenko at openjdk.org> wrote:

>> On restore, there might be PID value conflicts because of small PID values, if it was checkpoint'ed in a container. Therefore, when checkpointing in a container, we need to move PID value for new processes to a particular value to avoid conflicts on restore.
>> 
>> See https://github.com/CRaC/example-lambda/blob/master/checkpoint.cmd.sh#L8 for example.
>> 
>> This PR contains implemented functionality similar to the example above, making this work out of the box. By default, if checkpointing, PID value for new processes starts from 128. 
>> 
>> To set a custom value, `CRAC_MIN_PID` environment variable should be used. 
>> Min `CRAC_MIN_PID` value is 1, max `CRAC_MIN_PID` is not implemented currently.
>
> Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Disabling the test for non-linux platforms

src/java.base/share/native/launcher/main.c line 194:

> 192:         return;
> 193:     }
> 194:     if (0 > fprintf(last_pid_file, "%d", pid)) {

Honestly I'd prefer write() to fprintf, to eliminate any possibility of buffering to write partial value to the kernel-exposed file.

src/java.base/share/native/launcher/main.c line 195:

> 193:     }
> 194:     if (0 > fprintf(last_pid_file, "%d", pid)) {
> 195:         perror("last_pid_file fprintf");

We expect EPERM/EACCES, in these cases there should be no error printed.

src/java.base/share/native/launcher/main.c line 201:

> 199: 
> 200: static void spin_last_pid(int pid) {
> 201:     for (pid_t child = fork(); child < (pid_t)pid; child = fork()) {

Although this part is not peformance critical, in my previous experiments, fork() was considerably more expensive compared to some minimalistic thread or vfork(). The best I found was `syscall(SYS_clone, SIGCHLD, NULL, NULL, 0)`. Could you check can that be made not too ugly? Maybe pthread_create?

src/java.base/share/native/launcher/main.c line 209:

> 207:             perror("waitpid last pid");
> 208:             break;
> 209:         }

Can we drop this completely and rely on wait_for_children() below in the control flow?

src/java.base/share/native/launcher/main.c line 328:

> 326:         const char *env_min_pid_str = getenv("CRAC_MIN_PID");
> 327:         const int env_min_pid = env_min_pid_str ? atoi(env_min_pid_str) : 0;
> 328:         // TODO: should it be checked for max pid overflow?

I don't quite follow max_pid problem. Could you elaborate? As for me, I don't see the need for max pid check

src/java.base/share/native/launcher/main.c line 331:

> 329:         const int crac_min_pid = 0 < env_min_pid ? env_min_pid : crac_min_pid_default;
> 330: 
> 331:         if (getpid() <= crac_min_pid) {

A nit: probably `getpid() < crac_min_pid`?

src/java.base/share/native/launcher/main.c line 339:

> 337:                     spin_last_pid(crac_min_pid);
> 338:                 }
> 339:             }

I think we may drop get_last_pid completely. E.g. it should be enough to 

if (!set_last_pid(crac_min_pid - 1)) { // set_last_pid reports status
  spin_last_pid(...)
}

src/java.base/share/native/launcher/main.c line 344:

> 342:             // by creating the main process waiting for children before exit.
> 343:             g_child_pid = fork();
> 344:             if (0 < g_child_pid) {

Does it make sense to check if `g_child_pid < crac_min_pid`? As the ns_last_pid is a subject for races, and instead of guessing what the next pid will be, we'll have the ability to explicity check if the child has the right pid. The child will have to compare its pid with the expected crac_min_pid.

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1236978265
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1236948875
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1236968013
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1236968977
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1236953632
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1236957141
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1236956565
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1236976744


More information about the crac-dev mailing list