ProcessHandle.Info startInstant() drifts with system time on Linux
Roger Riggs
Roger.Riggs at oracle.com
Wed Aug 7 17:42:20 UTC 2019
Hi Markus,
core-libs-dev at openjdk.java.net is the more appropriate list for this
question.
The ProcessHandle info is based directly on the OS information about a
process, there is no separate information stored or kept.
If Linux had a stable way to represent the start time it would be
reflected in the ProcessHandle info.
Changing the clock on any running system is fraught. So many time
related actions depend on linear and/or monotonic progression of time.
Re-reading the boot time would be a performance hit and be subject to a
race condition with setting time.
The process handle also uses the start time to validate a ProcessHandle;
since the cached boot time does not change the value is stable for that
check.
A workaround for your application might be to save the relative start
time by reading the boot time directly from /proc or the relative start
time from /proc.
Roger
On 8/7/19 7:34 AM, Duft Markus wrote:
> Hey,
>
>
> Just checking in to see whether this is a bug, or by design (and where I should report this if it /is/ a bug :)).
>
>
> We discovered, that ProcessHandle.Info.startInstant() seems to be non-constant for a given long running process when restarting the Java VM. We have a deployment tool (https://bdeploy.io) which monitors processes and tries to recover running process information when starting up. The general process would be:
>
>
> 1) Start BDeploy
>
> 2) BDeploy starts a child process, records its PID and startInstant to be able to find it again later
>
> 3) BDeploy is stopped (for whatever reason) and the child process keeps running (we do make sure that this is the case :)).
>
> 4) BDeploy is started again, tries to find running processes from its previous life to resume monitoring.
>
> 5) BDeploy reads PID and startInstant from a file, finds a ProcessHandle using the PID and compares the startInstant. This is to avoid finding wrong processes when PIDs wrap.
>
>
> Now we have the problem that this does not work always. Analysis have led us to the point where we identifier NTPD or manual clock setting as the root cause. It seems that a system clock change will change the absolute timestamp of a process start. I had a look at the JDK sources and found that this is actually true :) Here is what I found out and documented on our own bugtracker:
>
>
> The relevant java native method on l??inux does this:
>
> /*
> * Try to stat and then open /proc/%d/stat
> */
> snprintf(fn, sizeof fn, "/proc/%d/stat", pid);
>
> fp = fopen(fn, "r");
> ...
>
> // Scan the needed fields from status, retaining only ppid(4),
> // utime (14), stime(15), starttime(22)
> if (4 != sscanf(s, " %*c %d %*d %*d %*d %*d %*d %*u %*u %*u %*u %lu %lu %*d %*d %*d %*d %*d %*d %llu",
> &parentPid, &utime, &stime, &start)) {
> return 0; // not all values parsed; return error
> }
>
> *totalTime = (utime + stime) * (jlong)(1000000000 / clock_ticks_per_second);
>
> *startTime = bootTime_ms + ((start * 1000) / clock_ticks_per_second);
> ...
>
>
> So the process start time is calculated rela?tive to the system kernel boot time. The boot time is calculated ONCE when starting the java VM like this:
>
> fp = fopen("/proc/stat", "r");
> ...
>
> while (getline(&line, &len, fp) != -1) {
> if (sscanf(line, "btime %llu", &bootTime) == 1) {
> break;
> }
> }
> ...
>
> return bootTime * 1000;
>
>
> However, the /proc/stat btime field does not seem to be constant. When I manually set the clock 2 minutes ahead, the btime field follows along, and is now two minutes later than before. Thus any system time correction (manual, ntpd, ...) will change the system boot time, which will make the timestamp different, but only after restarting the JVM.
>
> This is IMHO a bug in the JVM. The btime is a relative timestamp (uptime in nanoseconds if i'm correct). The absolute representation of this timestamp is calculated on the fly when reading /proc/stat from the current date/time (assuming that the current date/time is correct). This is not a problem (and correct!) for a human reader. The field should just never be taken to calculate another absolute timestamp, which java does...
>
> So we pseudo-have:
> bootTime_ms
> btime = current-clock-timestamp - kernel-uptime;
>
> proc-start = btime + process-uptime;
>
> All three variables are mutable and may (and will) change. If (and only if) kernel-uptime and process-uptime are updated correctly, calculating the absolute start-time will yield a reproducible result only as long as the system clock does not change.
>
>
> I'm pretty sure this qualifies as bug, but not whether it's a java a kernel (or whatever) bug... Any advice, also how to get around this, would be greatly appreciated. I would really like to avoid hard-coding allowed drift ranges or something like this, especially as we'll be running into timezone, summer time, etc. issues AFAICT...
>
>
> Cheers,
>
> Thanks for ANY help,
>
> Markus
>
> SSI Schäfer IT Solutions GmbH | Friesachstrasse 15 | 8114 Friesach | Austria
> Registered Office: Friesach | Commercial Register: 49324 K | VAT no. ATU28654300
> Commercial Court: Landesgericht für Zivilrechtssachen Graz
> Unsere Hinweise zum Umgang mit personenbezogenen Daten finden Sie hier<https://www.ssi-schaefer.com/de-at/datenschutz-49548>.
> You can find our information on the handling of personal data here<https://www.ssi-schaefer.com/en-at/privacy-13258>.
More information about the jdk-dev
mailing list