JVM64bit running on Linux 64bit: when system time changes the JVM may hang (bug_id=6900441)

David Holmes david.holmes at oracle.com
Mon Sep 2 05:59:17 PDT 2013


Hi Bruno,

As you note this is a very old issue. The reason it hasn't become a 
priority to fix was because it didn't actually manifest. In theory it 
should but in practice some "incorrect" clock handling in the kernel 
made everything work okay. Jump forward to now and we have already seen 
reports where this has become a problem on 64-bit but still works okay 
on 32-bit - which is very puzzling as in theory there should be no 
difference. My own thoughts are that something has been "fixed" in the 
64-bit linux kernel and that this now exposes this issue where 
previously it did not.

The basic sleep/wait/park with relative timeouts all use the same 
underlying mechanism on linux: pthread_cond_timedwait. This takes an 
absolute time which is currently based on CLOCK_REALTIME. So in theory 
if the clock is set forward the waits will complete earlier; and if set 
back they will complete later. But note this is not what was observed in 
practice.

The fix is quite straight-forward, assuming the kernel does the right 
thing - and that is to use pthread_cond_t associated with 
CLOCK_MONOTONIC, or even better CLOCK_MONOTONIC_RAW.

But there is a complexity in the park code because that API allows both 
relative and absolute timeouts and for the absolute case we would have 
to use a different condition variable to wait on (one using 
CLOCK_REALTIME as it should be affected by changes to the clock!).

I can raise the priority of this but a fix for 8 may not be feasible 
given the current state of things.

David Holmes

On 2/09/2013 9:41 PM, bruno bossola wrote:
> Hi all,
>
> I am posting here after few message exchange on the LJC mailing list,
> from the 7u lead:
>
> ===================
> Looks like an old/known issue. I've seen varying reports around whether
> this is a linux kernel issue or jvm issue.
> I'd suggest that Bruno follows up with a question on the
> hotspot-runtime-dev at openjdk.java.net
> <mailto:hotspot-runtime-dev at openjdk.java.net> mailing list [...]
> ====================
>
> In these days my teams are hitting a bug on the JVM 64bit on Linux
> 64bit: "...there is bug in JVM for overall scheduling during Sytem time
> changes backward, which also impacts very basic Object.wait &
> Thread.sleep methods. It becomes too risky to keep Java App running when
> system time switches back by even certain seconds. You never know what
> your Java App will end up to." (source: stackoverflow.com
> <http://stackoverflow.com>)
>
> These are some of the consequences:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7139684
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6311057:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7139684
>
> The original bug is private, but I was told it's a P4 that unfortunately
> it's not looked after and gets simply shifted from this release to the
> next one
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6900441
>
> See also here for a stackoverflow drill:
> http://stackoverflow.com/questions/9044423/java-scheduler-which-is-completely-independent-of-system-time-changes
>
> Such bug is NOT fixed in the latest JVM, so the recommended  course of
> action is to restart the VM if a bit time jump happens (on small jumps
> the JVM will catch up). This is consistently happening on a 64bitvm when
> used on a 64bit linux system, regardless of the monotonicity of the
> underlying OS (at least apparently).
>
> Note that this should not happen for primitives such as
> System.nanoTime() (like the queue used internally for ScheduledExecutor)
> that should work correctly in presence of a monotonic system:
>
> jlong os::javaTimeNanos() {
>    if (Linux::supports_monotonic_clock()) {
>      struct timespec tp;
>      int status = Linux::clock_gettime(CLOCK_MONOTONIC, &tp);
>      assert(status == 0, "gettime error");
>      jlong result = jlong(tp.tv_sec) * (1000 * 1000 * 1000) +
> jlong(tp.tv_nsec);
>      return result;
>    } else {
>      timeval time;
>      int status = gettimeofday(&time, NULL);
>      assert(status != -1, "linux error");
>      jlong usecs = jlong(time.tv_sec) * (1000 * 1000) + jlong(time.tv_usec);
>      return 1000 * usecs;
>    }
> }
>
> Unfortunately, for some reasons, this is not the case on 1.6+ 64bitVM on
> 64bitLinux. Furthermore, to be more clear about the issue, the extent of
> it and the concurrency library, let me introduce this very simple program:
>
> import java.util.concurrent.locks.LockSupport;
>
> public class Main {
>
>      public static void main(String[] args) {
>
>          for (int i=100; i>0; i--) {
>              System.out.println(i);
>              LockSupport.parkNanos(1000L*1000L*1000L);
>          }
>
>          System.out.println("Done!");
>      }
> }
>
> While running it with a 64bit 1.6+ JVM on 64bit Linux, turn the clock
> down one hour and wait until the counter stops... magic!  I tested this
> on JDK6, JDK7 and latest JDK8 beta running on various Ubuntu distros.
> It's not just a matter of (old?) sleep() and wait() primitives, it also
> affects the new concurrency library. Please note that classic sleep()
> works correctly on JDK1.4: it qualifies this bug as a regression to me,
> and the fact that it's there since at least 7 years kind of troubles me.
>
> This is something we cannot easily manage as our software is installed
> on-premises to our customers, hence we have no control at all about time
> changes: if our application hangs, we are pretty much in big trouble.
>
> I'd really like to get your view on the matter.
>
> Thanks in advance,
>
>      Bruno
>
>


More information about the hotspot-runtime-dev mailing list