[crac] RFR: Print better diagnostics for spawning CRIU

Anton Kozlov akozlov at openjdk.org
Wed Aug 2 18:13:09 UTC 2023


On Mon, 31 Jul 2023 03:42:11 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:

> It was always a mystery why it does not work this time as there are no error messages printed.
> Still may system call errors are currently not reported but those errors at least do not happen commonly.
> In fact `os::exec_child_process_and_wait` should be more verbose but that is already a part of OpenJDK so I did not modify it.
> It would be much easier if `criuengine.c` was in C++. I was considering to switch it but then I wrote it already in C.
> I understand this code will become obsolete after planned integration of CRIU into JVM but it may not yet be soon enough.

A few comments, but the approach looks good.

src/hotspot/share/runtime/crac.cpp line 203:

> 201:   _crengine_args[1] = "checkpoint";
> 202:   add_crengine_arg(CRaCCheckpointTo);
> 203:   if (os::exec_child_process_and_wait(_crengine, _crengine_args) == 0)

The `checkpoint_restore` suits better to print an error message, please use a consistent reporting method (tty->print(), althoug we should start using unified loggin from some point).B BTW os::exec_child_process_and_wait was introduced in CRaC and it prints error messages in case of problems with creating new processes.

src/java.base/unix/native/criuengine/criuengine.c line 101:

> 99:     char *retval;
> 100:     if (path2[0] == '/') {
> 101:         retval = strdup(path2);

path_join apparently uses path1 only if path2 is relative, otherwise just returns path2. It's unclear from the function name. This looks more like

static char *path_from(const char *cwd, const char *path) {
  // compute abs path to `path`  with the workdir `cwd` assumed
}

src/java.base/unix/native/criuengine/criuengine.c line 115:

> 113: }
> 114: 
> 115: static char *path_abs(const char *rel) {

Is not this exactly `realpath()`?

src/java.base/unix/native/criuengine/criuengine.c line 131:

> 129:     free(rel1_abs);
> 130:     return retval;
> 131: }

This function is closer to path_from, rather than to abs. But by using just path_join/path_from, we'll provide a shorter yet still correct path to the log (the only use of the function), so I propose to drop it.

src/java.base/unix/native/criuengine/criuengine.c line 207:

> 205:         execv(criu, (char**)args);
> 206:         fprintf(stderr, "Cannot execute CRIU \"%s\": %m\n", criu);
> 207:         exit(77);

Please introduce a #define for this value.

src/java.base/unix/native/criuengine/criuengine.c line 215:

> 213:     if (child != wait(&status)) {
> 214:         criu_cmdline = join_args(args);
> 215:         fprintf(stderr, "Error waiting for spawned CRIU \"%s\": %m\n", criu_cmdline);

Calling fprintf repeatedly multiple time would save us from too complex join_args(), right? Could you do this?

src/java.base/unix/native/criuengine/criuengine.c line 227:

> 225:             criu_log = path_abs2(imagedir, log_local);
> 226:             fprintf(stderr, "Spawned CRIU \"%s\" has not properly exited: exit code %d - check %s\n", criu_cmdline, WEXITSTATUS(status), criu_log);
> 227:         }

Could you unify these handlings? There is a slight difference in each code path.

src/java.base/unix/native/criuengine/criuengine.c line 233:

> 231:     }
> 232:     free(criu_cmdline);
> 233:     free(criu_log);

Extensive use of free in the short living application is questionable. By dropping this we'll make it simpler and faster.

-------------

PR Review: https://git.openjdk.org/crac/pull/97#pullrequestreview-1557670255
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1281054667
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1282205951
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1282223771
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1282257484
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1281064366
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1281068032
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1281062679
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1282258763


More information about the crac-dev mailing list