JVM64bit running on Linux 64bit: when system time changes the JVM may hang (bug_id=6900441)

David Holmes david.holmes at oracle.com
Mon Sep 2 15:13:23 PDT 2013


On 3/09/2013 12:02 AM, bruno bossola wrote:
> Hi David,
>
> thanks for your answer, it clarifies the matter.
>
>
>  > My own thoughts are that something has been "fixed" in the 64-bit
> linux kernel
>  > and that this now exposes this issue where previously it did not.
>  >
> If I recall the matter correctly the Linux kernel was not always
> behaving and for that reason the JVM was double-checking outside using
> timeofday. This "fix" is now affecting the correct behaviour

No I don't think so. There have been a lot of bugs in this area. One 
issue we tried to address was the "early return from sleep/wait", by 
checking actual elapsed time rather than trusting the timed routines. 
But that only dealt with forward time jumps.

As I said in other email I don't yet know what exactly has changed in 
the 64-bit kernel to expose this issue.

>  > The fix is quite straight-forward, assuming the kernel does the right
> thing - and
>  > that is to use pthread_cond_t associated with CLOCK_MONOTONIC, or even
>  > better CLOCK_MONOTONIC_RAW.
>  > But there is a complexity in the park code because that API allows
> both relative
>  > and absolute timeouts and for the absolute case we would have to use
> a different
>  > condition variable to wait on (one using CLOCK_REALTIME as it should
> be affected
>  > by changes to the clock!).
>  >
> Looks like an if() to me: it should the old code when absolute and the
> fixed code when relative. Am I missing something?

There have to be two different condition variables associated with the 
same mutex, that form the combined park/unpark implementation. If you do 
a relative timed park you wait on one, if an absolute timed park then 
you wait on the other - they each use different clocks. The unpark code 
has to know which one to signal or redundantly signal both. Not terribly 
hard just more complex than simply switching a clock.

>  > I can raise the priority of this but a fix for 8 may not be feasible
> given the current state of things
>  >
> That's really unfortunate. My guess is that if this thing goes public or
> viral Java will be in big trouble. With the "lens" of VP of Engineering
> in my company I am really considering alternatives.
>
>
> Do you have any workaround to suggest? Can you send me some code I/my
> team can try to patch the native libraries?

I started to prototype the fix for this years ago. I'll see if I can 
revive the webrev. The basic change to use CLOCK_MONOTONIC is not hard. 
Using CLOCK_MONOTONIC_RAW may be harder (we haven't switch to that yet 
because our official build platforms haven't supported it).

I'll see what I can put up. But note this is not part of my day job.

Cheers,
David


> Cheers,
>
>      Bruno
>
>
>
> On Mon, Sep 2, 2013 at 1:59 PM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Hi Bruno,
>
>     As you note this is a very old issue. The reason it hasn't become a
>     priority to fix was because it didn't actually manifest. In theory
>     it should but in practice some "incorrect" clock handling in the
>     kernel made everything work okay. Jump forward to now and we have
>     already seen reports where this has become a problem on 64-bit but
>     still works okay on 32-bit - which is very puzzling as in theory
>     there should be no difference. My own thoughts are that something
>     has been "fixed" in the 64-bit linux kernel and that this now
>     exposes this issue where previously it did not.
>
>     The basic sleep/wait/park with relative timeouts all use the same
>     underlying mechanism on linux: pthread_cond_timedwait. This takes an
>     absolute time which is currently based on CLOCK_REALTIME. So in
>     theory if the clock is set forward the waits will complete earlier;
>     and if set back they will complete later. But note this is not what
>     was observed in practice.
>
>     The fix is quite straight-forward, assuming the kernel does the
>     right thing - and that is to use pthread_cond_t associated with
>     CLOCK_MONOTONIC, or even better CLOCK_MONOTONIC_RAW.
>
>     But there is a complexity in the park code because that API allows
>     both relative and absolute timeouts and for the absolute case we
>     would have to use a different condition variable to wait on (one
>     using CLOCK_REALTIME as it should be affected by changes to the clock!).
>
>     I can raise the priority of this but a fix for 8 may not be feasible
>     given the current state of things.
>
>     David Holmes
>
>
>     On 2/09/2013 9:41 PM, bruno bossola wrote:
>
>         Hi all,
>
>         I am posting here after few message exchange on the LJC mailing
>         list,
>         from the 7u lead:
>
>         ===================
>         Looks like an old/known issue. I've seen varying reports around
>         whether
>         this is a linux kernel issue or jvm issue.
>         I'd suggest that Bruno follows up with a question on the
>         hotspot-runtime-dev at openjdk.__java.net
>         <mailto:hotspot-runtime-dev at openjdk.java.net>
>         <mailto:hotspot-runtime-dev at __openjdk.java.net
>         <mailto:hotspot-runtime-dev at openjdk.java.net>> mailing list [...]
>
>         ====================
>
>         In these days my teams are hitting a bug on the JVM 64bit on Linux
>         64bit: "...there is bug in JVM for overall scheduling during
>         Sytem time
>         changes backward, which also impacts very basic Object.wait &
>         Thread.sleep methods. It becomes too risky to keep Java App
>         running when
>         system time switches back by even certain seconds. You never
>         know what
>         your Java App will end up to." (source: stackoverflow.com
>         <http://stackoverflow.com>
>         <http://stackoverflow.com>)
>
>
>         These are some of the consequences:
>         http://bugs.sun.com/__bugdatabase/view_bug.do?bug___id=7139684
>         <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7139684>
>         http://bugs.sun.com/__bugdatabase/view_bug.do?bug___id=6311057
>         <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6311057>:
>         http://bugs.sun.com/__bugdatabase/view_bug.do?bug___id=7139684
>         <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7139684>
>
>         The original bug is private, but I was told it's a P4 that
>         unfortunately
>         it's not looked after and gets simply shifted from this release
>         to the
>         next one
>         http://bugs.sun.com/__bugdatabase/view_bug.do?bug___id=6900441
>         <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6900441>
>
>         See also here for a stackoverflow drill:
>         http://stackoverflow.com/__questions/9044423/java-__scheduler-which-is-completely-__independent-of-system-time-__changes
>         <http://stackoverflow.com/questions/9044423/java-scheduler-which-is-completely-independent-of-system-time-changes>
>
>         Such bug is NOT fixed in the latest JVM, so the recommended
>           course of
>         action is to restart the VM if a bit time jump happens (on small
>         jumps
>         the JVM will catch up). This is consistently happening on a
>         64bitvm when
>         used on a 64bit linux system, regardless of the monotonicity of the
>         underlying OS (at least apparently).
>
>         Note that this should not happen for primitives such as
>         System.nanoTime() (like the queue used internally for
>         ScheduledExecutor)
>         that should work correctly in presence of a monotonic system:
>
>         jlong os::javaTimeNanos() {
>             if (Linux::supports_monotonic___clock()) {
>               struct timespec tp;
>               int status = Linux::clock_gettime(CLOCK___MONOTONIC, &tp);
>               assert(status == 0, "gettime error");
>               jlong result = jlong(tp.tv_sec) * (1000 * 1000 * 1000) +
>         jlong(tp.tv_nsec);
>               return result;
>             } else {
>               timeval time;
>               int status = gettimeofday(&time, NULL);
>               assert(status != -1, "linux error");
>               jlong usecs = jlong(time.tv_sec) * (1000 * 1000) +
>         jlong(time.tv_usec);
>               return 1000 * usecs;
>             }
>         }
>
>         Unfortunately, for some reasons, this is not the case on 1.6+
>         64bitVM on
>         64bitLinux. Furthermore, to be more clear about the issue, the
>         extent of
>         it and the concurrency library, let me introduce this very
>         simple program:
>
>         import java.util.concurrent.locks.__LockSupport;
>
>         public class Main {
>
>               public static void main(String[] args) {
>
>                   for (int i=100; i>0; i--) {
>                       System.out.println(i);
>                       LockSupport.parkNanos(1000L*__1000L*1000L);
>                   }
>
>                   System.out.println("Done!");
>               }
>         }
>
>         While running it with a 64bit 1.6+ JVM on 64bit Linux, turn the
>         clock
>         down one hour and wait until the counter stops... magic!  I
>         tested this
>         on JDK6, JDK7 and latest JDK8 beta running on various Ubuntu
>         distros.
>         It's not just a matter of (old?) sleep() and wait() primitives,
>         it also
>         affects the new concurrency library. Please note that classic
>         sleep()
>         works correctly on JDK1.4: it qualifies this bug as a regression
>         to me,
>         and the fact that it's there since at least 7 years kind of
>         troubles me.
>
>         This is something we cannot easily manage as our software is
>         installed
>         on-premises to our customers, hence we have no control at all
>         about time
>         changes: if our application hangs, we are pretty much in big
>         trouble.
>
>         I'd really like to get your view on the matter.
>
>         Thanks in advance,
>
>               Bruno
>
>
>


More information about the hotspot-runtime-dev mailing list