ProcessHandle.Info startInstant() drifts with system time on Linux
Duft Markus
Markus.Duft at ssi-schaefer.com
Wed Aug 7 13:28:26 UTC 2019
Hey,
Since attachments don't seem to work and our company blocks all file sharing (yikes), here the source inline, sorry for the noise...
----------- TEST.java ---------------
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class TEST {
private static final String FN = "info.pid";
private static final String[] CMD = { "/usr/bin/sleep", "10m" };
public static void main(String[] args) throws Exception {
if (args[0].equals("start")) {
doStart();
} else if (args[0].equals("recover")) {
doRecover();
}
}
private static void doRecover() throws Exception {
Path f = Paths.get(FN);
if (!Files.exists(f)) {
System.err.println(f + " does not exist, run with 'start' first");
}
String[] fields = new String(Files.readAllBytes(f), StandardCharsets.UTF_8).split(",");
long pid = Long.valueOf(fields[0]);
long recStartMillis = Long.valueOf(fields[1]);
ProcessHandle ph = ProcessHandle.of(pid).orElseThrow();
long phStartMillis = ph.info().startInstant().orElseThrow().toEpochMilli();
System.out.println("Found handle. PID=" + ph.pid() + ", start=" + phStartMillis);
if (recStartMillis != phStartMillis) {
System.err.println("Time drift detected! Start times: recorded=" + recStartMillis + ", actual=" + phStartMillis);
}
}
private static void doStart() throws Exception {
Path f = Paths.get(FN);
if (Files.exists(f)) {
Files.delete(f);
}
ProcessBuilder builder = new ProcessBuilder(CMD);
ProcessHandle handle = builder.start().toHandle();
long startMillis = handle.info().startInstant().orElseThrow().toEpochMilli();
System.out.println("Started. PID=" + handle.pid() + ", start=" + startMillis);
// write to file, remember
Files.write(f,
String.join(",", String.valueOf(handle.pid()), String.valueOf(startMillis)).getBytes(StandardCharsets.UTF_8));
}
}
----------------------------------------
Meanwhile I've been digging deeper and ended up in the kernel sources. Ultimately the function getboottime64 from ./kernel/time/timekeeping.c is used to determine the boot time. It's documentation tells a lot :)
/**
* getboottime64 - Return the real time of system boot.
* @ts: pointer to the timespec64 to be set
*
* Returns the wall-time of boot in a timespec64.
*
* This is based on the wall_to_monotonic offset and the total suspend
* time. Calls to settimeofday will affect the value returned (which
* basically means that however wrong your real time clock is at boot time,
* you get the right time here).
*/
void getboottime64(struct timespec64 *ts)
{
struct timekeeper *tk = &tk_core.timekeeper;
ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
*ts = ktime_to_timespec64(t);
}
Cheers,
Markus
________________________________________
From: jdk-dev <jdk-dev-bounces at openjdk.java.net> on behalf of Duft Markus
Sent: Wednesday, August 7, 2019 14:48
To: jdk-dev at openjdk.java.net
Subject: Re: ProcessHandle.Info startInstant() drifts with system time on Linux
Hey,
I have a test program to demonstrate the issue as well now:
mduft at fril0041 /ssd/workspaces/deployment/git/deployment/minion/bin/main $ date
Wed Aug 7 14:44:30 CEST 2019
mduft at fril0041 /ssd/workspaces/deployment/git/deployment/minion/bin/main $ $JAVA_HOME/bin/java TEST start
Started. PID=17210, start=1565181874360
mduft at fril0041 /ssd/workspaces/deployment/git/deployment/minion/bin/main $ $JAVA_HOME/bin/java TEST recover
Found handle. PID=17210, start=1565181874360
mduft at fril0041 /ssd/workspaces/deployment/git/deployment/minion/bin/main $ date
Wed Aug 7 14:44:38 CEST 2019
# changed system time to 5 minutes in the future here!
mduft at fril0041 /ssd/workspaces/deployment/git/deployment/minion/bin/main $ date
Wed Aug 7 14:49:03 CEST 2019
mduft at fril0041 /ssd/workspaces/deployment/git/deployment/minion/bin/main $ $JAVA_HOME/bin/java TEST recover
Found handle. PID=17210, start=1565182127360
Time drift detected! Start times: recorded=1565181874360, actual=1565182127360
mduft at fril0041 /ssd/workspaces/deployment/git/deployment/minion/bin/main $
The program is attached... Its as simple as possible, no error handling whatsoever, don't bash me ;)
I run and tested with:
openjdk 10.0.2-adoptopenjdk 2018-07-17
OpenJDK Runtime Environment (build 10.0.2-adoptopenjdk+13)
OpenJDK 64-Bit Server VM (build 10.0.2-adoptopenjdk+13, mixed mode)
(yep, we're still on 10, 11 is effort and we need third parties to support it first ;)).
Cheers,
Markus
________________________________________
From: jdk-dev <jdk-dev-bounces at openjdk.java.net> on behalf of Duft Markus
Sent: Wednesday, August 7, 2019 13:34
To: jdk-dev at openjdk.java.net
Subject: ProcessHandle.Info startInstant() drifts with system time on Linux
Hey,
Just checking in to see whether this is a bug, or by design (and where I should report this if it /is/ a bug :)).
We discovered, that ProcessHandle.Info.startInstant() seems to be non-constant for a given long running process when restarting the Java VM. We have a deployment tool (https://bdeploy.io) which monitors processes and tries to recover running process information when starting up. The general process would be:
1) Start BDeploy
2) BDeploy starts a child process, records its PID and startInstant to be able to find it again later
3) BDeploy is stopped (for whatever reason) and the child process keeps running (we do make sure that this is the case :)).
4) BDeploy is started again, tries to find running processes from its previous life to resume monitoring.
5) BDeploy reads PID and startInstant from a file, finds a ProcessHandle using the PID and compares the startInstant. This is to avoid finding wrong processes when PIDs wrap.
Now we have the problem that this does not work always. Analysis have led us to the point where we identifier NTPD or manual clock setting as the root cause. It seems that a system clock change will change the absolute timestamp of a process start. I had a look at the JDK sources and found that this is actually true :) Here is what I found out and documented on our own bugtracker:
The relevant java native method on l??inux does this:
/*
* Try to stat and then open /proc/%d/stat
*/
snprintf(fn, sizeof fn, "/proc/%d/stat", pid);
fp = fopen(fn, "r");
...
// Scan the needed fields from status, retaining only ppid(4),
// utime (14), stime(15), starttime(22)
if (4 != sscanf(s, " %*c %d %*d %*d %*d %*d %*d %*u %*u %*u %*u %lu %lu %*d %*d %*d %*d %*d %*d %llu",
&parentPid, &utime, &stime, &start)) {
return 0; // not all values parsed; return error
}
*totalTime = (utime + stime) * (jlong)(1000000000 / clock_ticks_per_second);
*startTime = bootTime_ms + ((start * 1000) / clock_ticks_per_second);
...
So the process start time is calculated rela?tive to the system kernel boot time. The boot time is calculated ONCE when starting the java VM like this:
fp = fopen("/proc/stat", "r");
...
while (getline(&line, &len, fp) != -1) {
if (sscanf(line, "btime %llu", &bootTime) == 1) {
break;
}
}
...
return bootTime * 1000;
However, the /proc/stat btime field does not seem to be constant. When I manually set the clock 2 minutes ahead, the btime field follows along, and is now two minutes later than before. Thus any system time correction (manual, ntpd, ...) will change the system boot time, which will make the timestamp different, but only after restarting the JVM.
This is IMHO a bug in the JVM. The btime is a relative timestamp (uptime in nanoseconds if i'm correct). The absolute representation of this timestamp is calculated on the fly when reading /proc/stat from the current date/time (assuming that the current date/time is correct). This is not a problem (and correct!) for a human reader. The field should just never be taken to calculate another absolute timestamp, which java does...
So we pseudo-have:
btime = current-clock-timestamp - kernel-uptime;
proc-start = btime + process-uptime;
All three variables are mutable and may (and will) change. If (and only if) kernel-uptime and process-uptime are updated correctly, calculating the absolute start-time will yield a reproducible result only as long as the system clock does not change.
I'm pretty sure this qualifies as bug, but not whether it's a java a kernel (or whatever) bug... Any advice, also how to get around this, would be greatly appreciated. I would really like to avoid hard-coding allowed drift ranges or something like this, especially as we'll be running into timezone, summer time, etc. issues AFAICT...
Cheers,
Thanks for ANY help,
Markus
SSI Schäfer IT Solutions GmbH | Friesachstrasse 15 | 8114 Friesach | Austria
Registered Office: Friesach | Commercial Register: 49324 K | VAT no. ATU28654300
Commercial Court: Landesgericht für Zivilrechtssachen Graz
Unsere Hinweise zum Umgang mit personenbezogenen Daten finden Sie hier<https://www.ssi-schaefer.com/de-at/datenschutz-49548>.
You can find our information on the handling of personal data here<https://www.ssi-schaefer.com/en-at/privacy-13258>.
SSI Schäfer IT Solutions GmbH | Friesachstrasse 15 | 8114 Friesach | Austria
Registered Office: Friesach | Commercial Register: 49324 K | VAT no. ATU28654300
Commercial Court: Landesgericht für Zivilrechtssachen Graz
Unsere Hinweise zum Umgang mit personenbezogenen Daten finden Sie hier<https://www.ssi-schaefer.com/de-at/datenschutz-49548>.
You can find our information on the handling of personal data here<https://www.ssi-schaefer.com/en-at/privacy-13258>.
SSI Schäfer IT Solutions GmbH | Friesachstrasse 15 | 8114 Friesach | Austria
Registered Office: Friesach | Commercial Register: 49324 K | VAT no. ATU28654300
Commercial Court: Landesgericht für Zivilrechtssachen Graz
Unsere Hinweise zum Umgang mit personenbezogenen Daten finden Sie hier<https://www.ssi-schaefer.com/de-at/datenschutz-49548>.
You can find our information on the handling of personal data here<https://www.ssi-schaefer.com/en-at/privacy-13258>.
More information about the jdk-dev
mailing list