[crac] RFR: PID adjustment on checkpoint [v3]
Roman Marchenko
rmarchenko at openjdk.org
Thu Jun 22 13:33:39 UTC 2023
On Thu, 22 Jun 2023 12:29:48 GMT, Radim Vansa <rvansa at openjdk.org> wrote:
>> Should Java fail if PID cannot be moved to a desired PID value? Or just warn and go on?
>
> If the value was explicitly set, I think it would be better to fail. When it's trying to get to PID 128 by default I think it is sufficient to warn the user **and** tell him that he could switch off the warning setting `-XX:CRMinPid=1`.
I did some experiments with PID spinning and a desired PID value that exceeds max_pid. It takes too long to spin PID until PID overflows. In case of a wrong value set by an user, this may seem like java hangs, so the user cannot wait so long to see the error message. This is also true for a valid desired PID value which is pretty big, e.g. 2_000_000.
We could remove `waitpid()` call to speed up PID spinning, but by removing this, we can easily reach container's resource limits (I tested this), so we cannot remove `waitpid` easily.
To avoid reading from `kernel/pid_max` and to avoid hanging on PID spinning, we could introduce max number of spin tries, say 10_000. If we reach this limit while spinning PIDs, we'd stop spinning and continue run Java with the currently reached PID. It actually seems doubtful that users want to move PID to 2M starting with PID=1 or 8 in a container. If users have some processes running in their container, on checkpoint they'd adjust desirable PID value in accordance with the state of the container, limited by a max try count we introducing. This solution seems portable for POSIX-like platforms.
Or, since things're becoming so complicated, it'd be easier to read `pid_max`, only for Linux though.
Please note that I'm talking about PID spinning, i.e. a case when writing to `ns_last_pid` haven't worked for some reasons.
Are there any additional pro/cons?
-------------
PR Review Comment: https://git.openjdk.org/crac/pull/86#discussion_r1238530470
More information about the crac-dev
mailing list