ProcessHandle.Info startInstant() drifts with system time on Linux

Duft Markus Markus.Duft at ssi-schaefer.com
Thu Aug 8 05:52:23 UTC 2019


Hey,

Thanks for the answer and information. OK - I understand that Linux does not provide a stable absolute timestamp per process to read, so the JVM tries to come up with one. Would it be possible somehow to provide the user with another API to uniquely identify a process (just any arbitrary "checksum" like value which identifies a handle even after VM restart)? Linux does have a stable /relative/ timestamp. Together with the PID, this is a (as-good-as-)unique ID (also on Windows (etc.) PID and starttime may it be relative to boot or absolute would be sufficient to identify any process).

Right now I ended up with this piece of code:

----------------------
    private long internalGetProcessStartTimestampCorrected(long pid, Instant reportedStartTime) {
        if (OsHelper.getRunningOs() == OperatingSystem.LINUX) {
            try {
                // read the single line from /proc/[pid]/stat, field no 22 is the start time.
                String line = new String(Files.readAllBytes(Paths.get("/proc", String.valueOf(pid), "stat")),
                        StandardCharsets.UTF_8);
                String[] split = line.split(" ");
                return Long.valueOf(split[21]);
            } catch (Exception e) {
                logger.log(l -> l.warn("Cannot read corrected start time of process, PID = {}.", pid, e));
            }
        }

        // we (for now) trust the OS to deliver a stable absolute timestamp.
        return reportedStartTime.toEpochMilli();
    }
----------------------

This takes the Instant reported by ProcessHandle.Info and returns a stable value which can be used to later on verify that the PID referrs to the same process still. The value is read from /proc/[pid]/stat in exactly the same way as the JVM does as well, just not adding the boot time here, so the timestamp is relative but stable, which is the important part. I'm aware that there is /some/ potential to get it wrong when rebooting and recovering a process that has the same PID and same offset, but I think the risk is negligible...

However this solution feels a little odd. I would prefer to not directly read OS dependent things, but rather have the JVM help me in unique identification. Any Ideas? Comments?

I am aware that timekeeping is /always/ a pain, especially when dependent on the clock (timezone, DST, ...).

Cheers, Thanks,
Markus
________________________________________
From: jdk-dev <jdk-dev-bounces at openjdk.java.net> on behalf of Roger Riggs <Roger.Riggs at oracle.com>
Sent: Wednesday, August 7, 2019 19:42
To: core-libs-dev
Subject: Re: ProcessHandle.Info startInstant() drifts with system time on Linux

Hi Markus,

core-libs-dev at openjdk.java.net is the more appropriate list for this
question.

The ProcessHandle info is based directly on the OS information about a
process, there is no separate information stored or kept.
If Linux had a stable way to represent the start time it would be
reflected in the ProcessHandle info.
Changing the clock on any running system is fraught.  So many time
related actions depend on linear and/or monotonic progression of time.

Re-reading the boot time would be a performance hit and be subject to a
race condition with setting time.
The process handle also uses the start time to validate a ProcessHandle;
since the cached boot time does not change the value is stable for that
check.

A workaround for your application might be to save the relative start
time by reading the boot time directly from /proc or the relative start
time from /proc.

Roger

On 8/7/19 7:34 AM, Duft Markus wrote:
> Hey,
>
>
> Just checking in to see whether this is a bug, or by design (and where I should report this if it /is/ a bug :)).
>
>
> We discovered, that ProcessHandle.Info.startInstant() seems to be non-constant for a given long running process when restarting the Java VM. We have a deployment tool (https://bdeploy.io) which monitors processes and tries to recover running process information when starting up. The general process would be:
>
>
> 1) Start BDeploy
>
> 2) BDeploy starts a child process, records its PID and startInstant to be able to find it again later
>
> 3) BDeploy is stopped (for whatever reason) and the child process keeps running (we do make sure that this is the case :)).
>
> 4) BDeploy is started again, tries to find running processes from its previous life to resume monitoring.
>
> 5) BDeploy reads PID and startInstant from a file, finds a ProcessHandle using the PID and compares the startInstant. This is to avoid finding wrong processes when PIDs wrap.
>
>
> Now we have the problem that this does not work always. Analysis have led us to the point where we identifier NTPD or manual clock setting as the root cause. It seems that a system clock change will change the absolute timestamp of a process start. I had a look at the JDK sources and found that this is actually true :) Here is what I found out and documented on our own bugtracker:
>
>
> The relevant java native method on l??inux does this:
>
>       /*
>       * Try to stat and then open /proc/%d/stat
>       */
>      snprintf(fn, sizeof fn, "/proc/%d/stat", pid);
>
>      fp = fopen(fn, "r");
>      ...
>
>      // Scan the needed fields from status, retaining only ppid(4),
>      // utime (14), stime(15), starttime(22)
>      if (4 != sscanf(s, " %*c %d %*d %*d %*d %*d %*d %*u %*u %*u %*u %lu %lu %*d %*d %*d %*d %*d %*d %llu",
>              &parentPid, &utime, &stime, &start)) {
>          return 0;              // not all values parsed; return error
>      }
>
>      *totalTime = (utime + stime) * (jlong)(1000000000 / clock_ticks_per_second);
>
>      *startTime = bootTime_ms + ((start * 1000) / clock_ticks_per_second);
>      ...
>
>
> So the process start time is calculated rela?tive to the system kernel boot time. The boot time is calculated ONCE when starting the java VM like this:
>
>      fp = fopen("/proc/stat", "r");
>      ...
>
>      while (getline(&line, &len, fp) != -1) {
>          if (sscanf(line, "btime %llu", &bootTime) == 1) {
>              break;
>          }
>      }
>      ...
>
>      return bootTime * 1000;
>
>
> However, the /proc/stat btime field does not seem to be constant. When I manually set the clock 2 minutes ahead, the btime field follows along, and is now two minutes later than before. Thus any system time correction (manual, ntpd, ...) will change the system boot time, which will make the timestamp different, but only after restarting the JVM.
>
> This is IMHO a bug in the JVM. The btime is a relative timestamp (uptime in nanoseconds if i'm correct). The absolute representation of this timestamp is calculated on the fly when reading /proc/stat from the current date/time (assuming that the current date/time is correct). This is not a problem (and correct!) for a human reader. The field should just never be taken to calculate another absolute timestamp, which java does...
>
> So we pseudo-have:
> bootTime_ms
> btime = current-clock-timestamp - kernel-uptime;
>
> proc-start = btime + process-uptime;
>
> All three variables are mutable and may (and will) change. If (and only if) kernel-uptime and process-uptime are updated correctly, calculating the absolute start-time will yield a reproducible result only as long as the system clock does not change.
>
>
> I'm pretty sure this qualifies as bug, but not whether it's a java a kernel (or whatever) bug... Any advice, also how to get around this, would be greatly appreciated. I would really like to avoid hard-coding allowed drift ranges or something like this, especially as we'll be running into timezone, summer time, etc. issues AFAICT...
>
>
> Cheers,
>
> Thanks for ANY help,
>
> Markus
>
> SSI Schäfer IT Solutions GmbH | Friesachstrasse 15 | 8114 Friesach | Austria
> Registered Office: Friesach | Commercial Register: 49324 K | VAT no. ATU28654300
> Commercial Court: Landesgericht für Zivilrechtssachen Graz
> Unsere Hinweise zum Umgang mit personenbezogenen Daten finden Sie hier<https://www.ssi-schaefer.com/de-at/datenschutz-49548>.
> You can find our information on the handling of personal data here<https://www.ssi-schaefer.com/en-at/privacy-13258>.


SSI Schäfer IT Solutions GmbH | Friesachstrasse 15 | 8114 Friesach | Austria
Registered Office: Friesach | Commercial Register: 49324 K | VAT no. ATU28654300
Commercial Court: Landesgericht für Zivilrechtssachen Graz
Unsere Hinweise zum Umgang mit personenbezogenen Daten finden Sie hier<https://www.ssi-schaefer.com/de-at/datenschutz-49548>.
You can find our information on the handling of personal data here<https://www.ssi-schaefer.com/en-at/privacy-13258>.


More information about the core-libs-dev mailing list