[crac] RFR: 8350845: [CRaC] Support C/R engines in form of a library

Radim Vansa rvansa at openjdk.org
Thu Feb 27 20:49:25 UTC 2025


On Thu, 27 Feb 2025 12:19:10 GMT, Timofei Pushkin <tpushkin at openjdk.org> wrote:

> Adds support for C/R engines implemented in form of dynamic libraries.
> 
> In the proposed API the engine controls the set of the configurable options — they are mainly passed by the user via the new `CRaCEngineOptions` VM option, but some may also be passed by the VM. Other VM and/or engine implementations may extend the API via "extensions".
> 
> The JVM is supposed to know what configuration options and API extensions are supported by calling `can_configure` and `get_extension`.
> 
> The user is supposed to know which options they can pass by asking the engine with `-XX:CRaCEngineOptions=help` (support for this is engine-dependent). Example of how this looks for bundled engines:
> 
> $ java -XX:CRaCEngine=simengine -XX:CRaCEngineOptions=help -XX:CRaCCheckpointTo=cr -version
> 
> crexec — pseudo-CRaC-engine used to relay data from JVM to a "real" engine implemented as an executable (instead of a library).
> The engine executable is expected to have CRaC-CRIU-like CLI. Support of the options below also depends on the engine executable.
> 
> Configuration options:
> * keep_running=<true/false> (default: false) — keep the process running after the checkpoint or kill it.
> * direct_map=<true/false> (default: false) — on restore, map process data directly from saved files. This may speedup the restore but the resulting process will not be the same as before the checkpoint.
> * args=<string> (default: "") — free space-separated arguments passed directly to the engine executable, e.g. "--arg1 --arg2 --arg3".
> * help — print this message.
> 
> openjdk version "24-internal" 2025-03-18
> OpenJDK Runtime Environment (slowdebug build 24-internal-adhoc.timpushkin.crac)
> OpenJDK 64-Bit Server VM (slowdebug build 24-internal-adhoc.timpushkin.crac, mixed mode, sharing)
> 
> 
> Notable related changes included in the patch:
> - The ability to pass arguments to C/R engine was removed from `CRaCEngine` because it can now be implemented through `CRaCEngineOptions`. E.g. `-XX:CRaCEngine=criu,--verbose` is now `-XX:CRaCEngine=criu -XX:CRaCEngineOptions=args=--verbose`.
> - `CRaCEngine` (as well as the new `CRaCEngineOptions`) VM Option is now not updated in the restored VM. This seems more correct because engine path and options will very likely differ on restore and on checkpoint, so it is not obvious how to combine them — just replacing won't always work. Until now the updates were ignored by the CRaC implementation anyway, so this should only be visible by inspecting the options through JMX.
> - `CRaCRestoreFrom` is now co...

src/hotspot/share/include/crlib/crlib.h line 67:

> 65:   bool (*can_configure)(crlib_conf_t *, const char *key);
> 66:   // Sets a configuration option. Returns true on success.
> 67:   // Key and value are valid C-strings.

The docs should state that `value` does not have to be valid after the call (is copied if needed).

src/hotspot/share/include/crlib/crlib_restore_data.h line 42:

> 40:   // Called by the restoring application to pass data to the restored application.
> 41:   // 'data' must not be null, size must be greater than 0.
> 42:   // The engine may impose limits on the data size and return false if it is not accepted.

Is the data copied in?

src/hotspot/share/runtime/crac.cpp line 119:

> 117: #endif // _WINDOWS
> 118: 
> 119: class crac::EngineHandle : public CHeapObj<mtInternal> {

I think that `crac.cpp` is already getting too long; It look like this could be a good candidate for moving this and `find_crac_engine` (maybe as a static class method) to `crac_engine.cpp|hpp`?

src/hotspot/share/runtime/crac.cpp line 375:

> 373:   }
> 374:   if (restore_data_api->set_restore_data == nullptr || restore_data_api->get_restore_data == nullptr) {
> 375:     log_debug(crac)("CRaC engine failed to fully initialize API extension: %s", CRLIB_EXTENSION_RESTORE_DATA_NAME);

I would make the functions mandatory in the extension, turning this into an error.

src/hotspot/share/runtime/crac.cpp line 402:

> 400:   }
> 401: 
> 402:   const auto *restore_data_api = get_restore_data_api(_engine->api());

Just a note: I think that eventually we want to pass all the restore data (not just shmid) through the restore_data. Ideally when we know that common implementations can handle any length of these; then the temporary file would be just an implementation detail of `crexec`.

src/java.base/share/native/libcrexec/crexec.c line 55:

> 53: // added first are faster to find.
> 54: #define CONFIGURE_OPTIONS(OPT) \
> 55:   OPT(exec_location, "exec_location") \

#define DUP(x) x, #x
...
OPT(DUP(exec_location)) \

src/java.base/share/native/libcrexec/crexec.c line 103:

> 101:   // * image_location=<path> — path to a directory with checkpoint/restore files.
> 102:   // * exec_location=<path> — path to the engine executable.
> 103:   const int ret = printf(

How about using C++11 raw string literals for multiline text?

src/java.base/share/native/libcrexec/hashtable.c line 48:

> 46: 
> 47:   ht->length = length;
> 48:   ht->keys = (const char **) malloc(length * sizeof(const char **));

I guess you want `sizeof(const char *)` / `sizeof(const void *)` in these?

test/jdk/jdk/crac/CracEngineOptionsTest.java line 135:

> 133:                 "-version");
> 134:         if (opts != null) {
> 135:             pb.command().add(pb.command().size() - 2, "-XX:CRaCEngineOptions=" + opts);

`CRaCEngineOptions` is `ccstrlist`, so sometimes you parse options separated by commas (when passed like above) but if you repeat the option the value gets concatenated with newlines. Could you test that usecase as well?

test/jdk/jdk/crac/PropertyTest.java line 37:

> 35:  */
> 36: 
> 37: public class PropertyTest implements CracTest {

Well, this test doesn't test anything/covers behaviour from other tests. I would rather  remove that completely.

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/207#discussion_r1973585965
PR Review Comment: https://git.openjdk.org/crac/pull/207#discussion_r1973590764
PR Review Comment: https://git.openjdk.org/crac/pull/207#discussion_r1973615898
PR Review Comment: https://git.openjdk.org/crac/pull/207#discussion_r1973666863
PR Review Comment: https://git.openjdk.org/crac/pull/207#discussion_r1974216079
PR Review Comment: https://git.openjdk.org/crac/pull/207#discussion_r1974266475
PR Review Comment: https://git.openjdk.org/crac/pull/207#discussion_r1974269652
PR Review Comment: https://git.openjdk.org/crac/pull/207#discussion_r1974293816
PR Review Comment: https://git.openjdk.org/crac/pull/207#discussion_r1974314346
PR Review Comment: https://git.openjdk.org/crac/pull/207#discussion_r1974308739


More information about the crac-dev mailing list