[crac] RFR: Terminate restored process when criuengine restorewait exits

Anton Kozlov akozlov at openjdk.org
Wed Nov 15 19:47:11 UTC 2023


On Thu, 19 Oct 2023 11:21:52 GMT, Radim Vansa <rvansa at openjdk.org> wrote:

> With criuengine the restored process gets restorewait process as its parent; scripts not expecting two processes might signal (e.g. terminate) the parent process but the actual restored process would get orphaned.

src/hotspot/os/linux/crac_linux.cpp line 471:

> 469: 
> 470: void crac::set_terminate_with_parent() {
> 471:   if (prctl(PR_SET_PDEATHSIG, SIGTERM)) {

The signal that has killed the parent should be SIGKILL, so it should be specified here as well.

src/hotspot/share/runtime/crac.cpp line 249:

> 247: #endif //LINUX
> 248: 
> 249:   if (ends_with(_crengine, "criuengine")) {

It's strange the VM assumes the process hierarchy as implemented by criuengine. The cleaner way would be implementing something like this in criuengine itself. Probably setting PR_SET_PDEATHSIG will be complicated, but that can be overcomed by adding the code in our criu fork, if required. Anyway, the VM should not be aware that much how the checkpoint engine behaves.

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/131#discussion_r1394707185
PR Review Comment: https://git.openjdk.org/crac/pull/131#discussion_r1394713637


More information about the crac-dev mailing list