[crac] RFR: Print better diagnostics for spawning CRIU
Anton Kozlov
akozlov at openjdk.org
Wed Aug 2 18:13:09 UTC 2023
On Mon, 31 Jul 2023 03:42:11 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:
> It was always a mystery why it does not work this time as there are no error messages printed.
> Still may system call errors are currently not reported but those errors at least do not happen commonly.
> In fact `os::exec_child_process_and_wait` should be more verbose but that is already a part of OpenJDK so I did not modify it.
> It would be much easier if `criuengine.c` was in C++. I was considering to switch it but then I wrote it already in C.
> I understand this code will become obsolete after planned integration of CRIU into JVM but it may not yet be soon enough.
A few comments, but the approach looks good.
src/hotspot/share/runtime/crac.cpp line 203:
> 201: _crengine_args[1] = "checkpoint";
> 202: add_crengine_arg(CRaCCheckpointTo);
> 203: if (os::exec_child_process_and_wait(_crengine, _crengine_args) == 0)
The `checkpoint_restore` suits better to print an error message, please use a consistent reporting method (tty->print(), althoug we should start using unified loggin from some point).B BTW os::exec_child_process_and_wait was introduced in CRaC and it prints error messages in case of problems with creating new processes.
src/java.base/unix/native/criuengine/criuengine.c line 101:
> 99: char *retval;
> 100: if (path2[0] == '/') {
> 101: retval = strdup(path2);
path_join apparently uses path1 only if path2 is relative, otherwise just returns path2. It's unclear from the function name. This looks more like
static char *path_from(const char *cwd, const char *path) {
// compute abs path to `path` with the workdir `cwd` assumed
}
src/java.base/unix/native/criuengine/criuengine.c line 115:
> 113: }
> 114:
> 115: static char *path_abs(const char *rel) {
Is not this exactly `realpath()`?
src/java.base/unix/native/criuengine/criuengine.c line 131:
> 129: free(rel1_abs);
> 130: return retval;
> 131: }
This function is closer to path_from, rather than to abs. But by using just path_join/path_from, we'll provide a shorter yet still correct path to the log (the only use of the function), so I propose to drop it.
src/java.base/unix/native/criuengine/criuengine.c line 207:
> 205: execv(criu, (char**)args);
> 206: fprintf(stderr, "Cannot execute CRIU \"%s\": %m\n", criu);
> 207: exit(77);
Please introduce a #define for this value.
src/java.base/unix/native/criuengine/criuengine.c line 215:
> 213: if (child != wait(&status)) {
> 214: criu_cmdline = join_args(args);
> 215: fprintf(stderr, "Error waiting for spawned CRIU \"%s\": %m\n", criu_cmdline);
Calling fprintf repeatedly multiple time would save us from too complex join_args(), right? Could you do this?
src/java.base/unix/native/criuengine/criuengine.c line 227:
> 225: criu_log = path_abs2(imagedir, log_local);
> 226: fprintf(stderr, "Spawned CRIU \"%s\" has not properly exited: exit code %d - check %s\n", criu_cmdline, WEXITSTATUS(status), criu_log);
> 227: }
Could you unify these handlings? There is a slight difference in each code path.
src/java.base/unix/native/criuengine/criuengine.c line 233:
> 231: }
> 232: free(criu_cmdline);
> 233: free(criu_log);
Extensive use of free in the short living application is questionable. By dropping this we'll make it simpler and faster.
-------------
PR Review: https://git.openjdk.org/crac/pull/97#pullrequestreview-1557670255
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1281054667
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1282205951
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1282223771
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1282257484
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1281064366
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1281068032
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1281062679
PR Review Comment: https://git.openjdk.org/crac/pull/97#discussion_r1282258763
More information about the crac-dev
mailing list