JVM64bit running on Linux 64bit: when system time changes the JVM may hang (bug_id=6900441)

Andrew Haley aph at redhat.com
Mon Sep 2 07:01:39 PDT 2013


Yabbut, we really can't rely on a kernel bug to get correct behaviour.
If the kernel is doing what it's supposed to do then this behaviour is
exactly what should happen.


On 09/02/2013 01:59 PM, David Holmes wrote:
> Hi Bruno,
> As you note this is a very old issue. The reason it hasn't become a 
> priority to fix was because it didn't actually manifest. In theory it 
> should but in practice some "incorrect" clock handling in the kernel 
> made everything work okay. Jump forward to now and we have already seen 
> reports where this has become a problem on 64-bit but still works okay 
> on 32-bit - which is very puzzling as in theory there should be no 
> difference. My own thoughts are that something has been "fixed" in the 
> 64-bit linux kernel and that this now exposes this issue where 
> previously it did not.
> The basic sleep/wait/park with relative timeouts all use the same 
> underlying mechanism on linux: pthread_cond_timedwait. This takes an 
> absolute time which is currently based on CLOCK_REALTIME. So in theory 
> if the clock is set forward the waits will complete earlier; and if set 
> back they will complete later. But note this is not what was observed in 
> practice.
> The fix is quite straight-forward, assuming the kernel does the right 
> thing - and that is to use pthread_cond_t associated with 
> But there is a complexity in the park code because that API allows both 
> relative and absolute timeouts and for the absolute case we would have 
> to use a different condition variable to wait on (one using 
> CLOCK_REALTIME as it should be affected by changes to the clock!).
> I can raise the priority of this but a fix for 8 may not be feasible 
> given the current state of things.
> David Holmes
> On 2/09/2013 9:41 PM, bruno bossola wrote:
>> Hi all,
>> I am posting here after few message exchange on the LJC mailing list,
>> from the 7u lead:
>> ===================
>> Looks like an old/known issue. I've seen varying reports around whether
>> this is a linux kernel issue or jvm issue.
>> I'd suggest that Bruno follows up with a question on the
>> hotspot-runtime-dev at openjdk.java.net
>> <mailto:hotspot-runtime-dev at openjdk.java.net> mailing list [...]
>> ====================
>> In these days my teams are hitting a bug on the JVM 64bit on Linux
>> 64bit: "...there is bug in JVM for overall scheduling during Sytem time
>> changes backward, which also impacts very basic Object.wait &
>> Thread.sleep methods. It becomes too risky to keep Java App running when
>> system time switches back by even certain seconds. You never know what
>> your Java App will end up to." (source: stackoverflow.com
>> <http://stackoverflow.com>)
>> These are some of the consequences:
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7139684
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6311057:
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7139684
>> The original bug is private, but I was told it's a P4 that unfortunately
>> it's not looked after and gets simply shifted from this release to the
>> next one
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6900441
>> See also here for a stackoverflow drill:
>> http://stackoverflow.com/questions/9044423/java-scheduler-which-is-completely-independent-of-system-time-changes
>> Such bug is NOT fixed in the latest JVM, so the recommended  course of
>> action is to restart the VM if a bit time jump happens (on small jumps
>> the JVM will catch up). This is consistently happening on a 64bitvm when
>> used on a 64bit linux system, regardless of the monotonicity of the
>> underlying OS (at least apparently).
>> Note that this should not happen for primitives such as
>> System.nanoTime() (like the queue used internally for ScheduledExecutor)
>> that should work correctly in presence of a monotonic system:
>> jlong os::javaTimeNanos() {
>>    if (Linux::supports_monotonic_clock()) {
>>      struct timespec tp;
>>      int status = Linux::clock_gettime(CLOCK_MONOTONIC, &tp);
>>      assert(status == 0, "gettime error");
>>      jlong result = jlong(tp.tv_sec) * (1000 * 1000 * 1000) +
>> jlong(tp.tv_nsec);
>>      return result;
>>    } else {
>>      timeval time;
>>      int status = gettimeofday(&time, NULL);
>>      assert(status != -1, "linux error");
>>      jlong usecs = jlong(time.tv_sec) * (1000 * 1000) + jlong(time.tv_usec);
>>      return 1000 * usecs;
>>    }
>> }
>> Unfortunately, for some reasons, this is not the case on 1.6+ 64bitVM on
>> 64bitLinux. Furthermore, to be more clear about the issue, the extent of
>> it and the concurrency library, let me introduce this very simple program:
>> import java.util.concurrent.locks.LockSupport;
>> public class Main {
>>      public static void main(String[] args) {
>>          for (int i=100; i>0; i--) {
>>              System.out.println(i);
>>              LockSupport.parkNanos(1000L*1000L*1000L);
>>          }
>>          System.out.println("Done!");
>>      }
>> }
>> While running it with a 64bit 1.6+ JVM on 64bit Linux, turn the clock
>> down one hour and wait until the counter stops... magic!  I tested this
>> on JDK6, JDK7 and latest JDK8 beta running on various Ubuntu distros.
>> It's not just a matter of (old?) sleep() and wait() primitives, it also
>> affects the new concurrency library. Please note that classic sleep()
>> works correctly on JDK1.4: it qualifies this bug as a regression to me,
>> and the fact that it's there since at least 7 years kind of troubles me.
>> This is something we cannot easily manage as our software is installed
>> on-premises to our customers, hence we have no control at all about time
>> changes: if our application hangs, we are pretty much in big trouble.
>> I'd really like to get your view on the matter.
>> Thanks in advance,
>>      Bruno

More information about the hotspot-runtime-dev mailing list