[crac] RFR: Add Checkpoint timeout
Radim Vansa
rvansa at openjdk.org
Fri Dec 8 09:58:49 UTC 2023
On Fri, 8 Dec 2023 09:14:19 GMT, KIRIYAMA Takuya <duke at openjdk.org> wrote:
> Java process sometimes hangs when checkpoint for some reasons.
> For example, this problems occurs if you specify certain options for CRAC_CRIU_OPTS.
>
>
> # export CRAC_CRIU_OPTS=-V
> # java -XX:CRaCCheckpointTo=/work/cp CRACTest
> CR: Checkpoint ...
>
> CRACTest process is not killed and is waiting for checkpoint.
>
>
> # ls /work/cp
> cppath perfdata
>
>
> To avoid this problem, I want to add the checkpoint timeout.
> Can I submit a pull request to this repository? I would like you to review this change.
Hello @tkiriyama , could you clarify a bit more under what situation CRIU gets stuck? I guess that you've used `-V` just to demonstrate a situation where CRIU does not checkpoint the application as expected.
The checkpointed JVM waits indefinitely because CRIU exits with 0 - had it been an unsuccessful invocation the JVM would get the signal to stop waiting. What `criuengine.c` could do extra is to do one more check if the process still exists (when CRIU is done) and signal JVM if it suspects that CRIU didn't do the checkpoint. However there would be a race with a situation where it is immediately restored. Given that `CRAC_CRIU_OPTS` are meant to be expert options, I am not sure if we should protect against shooting yourselves into the foot this way.
I appreciate the effort of creating the PR, and kudos for the test, but I'd like to first hear about any real-world usecase where this timeout is useful.
-------------
PR Comment: https://git.openjdk.org/crac/pull/147#issuecomment-1846884698
More information about the crac-dev
mailing list