[crac] RFR: Correct System.nanotime() value after restore [v2]

Anton Kozlov akozlov at openjdk.org
Thu Mar 30 11:28:56 UTC 2023


On Thu, 30 Mar 2023 07:55:35 GMT, Radim Vansa <duke at openjdk.org> wrote:

>> There are various places both inside JDK and in libraries that rely on monotonicity of `System.nanotime()`. When the process is restored on a different machine the value will likely differ as the implementation provides time since machine boot. This PR records wall clock time before checkpoint and after restore and tries to adjust the value provided by nanotime() to reasonably correct value.
>
> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Correct time since restore

src/hotspot/share/runtime/os.cpp line 2041:

> 2039:       checkpoint_millis = javaTimeMillis();
> 2040:       checkpoint_nanos = javaTimeNanos();
> 2041:   }

Sorry, I don't follow why for repeated checkpoint-restore it's enough or desireable to store checkpoint_nanos once. Suppose we've done first restore on the same machine that was used for checkpoint (roughly millis diff equals to nanos diff, so time adjustement implemented does not contritube anything for first restore), then checkpoint again. And then perform 2nd restore on another machine in a very short time, suppose immediatelly. Then `checkpoint_nanos - javaTimeNanos()` adjustement can become any value, depending on the difference between clocks on two machines, completely unrelated to each other.

src/hotspot/share/runtime/os.cpp line 2051:

> 2049:     diff_millis = 0;
> 2050:   }
> 2051:   javaTimeNanos_offset = checkpoint_nanos - javaTimeNanos() + diff_millis * 1000000L;

So all the difference in MONOTONIC clocks is eliminated and replaced with REALITIME estimation, even if the restore was performed on the same machine and the monotonic clocks difference made sense. That may invalidate some measurements done with System.nanoTime() around checkpoint-restore 


long before = System.nanoTime();
// ... checkpoint & restore .. 
long after = System.nanoTime();
System.out.println(after - before);


One of the way to fix that is to fix just monotonicity of System.nanoTime(), preventing that going backward.


long diff = javaTimeNanos() - checkpoint_nanos;
if (diff < 0) {
  javaTimeNanos_offset = -diff
}


But that anyway will fail if the timens has been changed (that we can probably detect) or when the image is transfered to another machine (that should be possible, but probably more tricky). Do we have any way to detect we've been transfered to another machine and report a warning in this case, with possible millis approximation enabled?

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1150348959
PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1150356508


More information about the crac-dev mailing list