JVM64bit running on Linux 64bit: when system time changes the JVM may hang (bug_id=6900441)

bruno bossola bbossola at gmail.com
Mon Sep 2 07:02:50 PDT 2013


Hi David,

thanks for your answer, it clarifies the matter.


> My own thoughts are that something has been "fixed" in the 64-bit linux
kernel
> and that this now exposes this issue where previously it did not.
>
If I recall the matter correctly the Linux kernel was not always behaving
and for that reason the JVM was double-checking outside using timeofday.
This "fix" is now affecting the correct behaviour


> The fix is quite straight-forward, assuming the kernel does the right
thing - and
> that is to use pthread_cond_t associated with CLOCK_MONOTONIC, or even
> better CLOCK_MONOTONIC_RAW.
> But there is a complexity in the park code because that API allows both
relative
> and absolute timeouts and for the absolute case we would have to use a
different
> condition variable to wait on (one using CLOCK_REALTIME as it should be
affected
> by changes to the clock!).
>
Looks like an if() to me: it should the old code when absolute and the
fixed code when relative. Am I missing something?


> I can raise the priority of this but a fix for 8 may not be feasible
given the current state of things
>
That's really unfortunate. My guess is that if this thing goes public or
viral Java will be in big trouble. With the "lens" of VP of Engineering in
my company I am really considering alternatives.


Do you have any workaround to suggest? Can you send me some code I/my team
can try to patch the native libraries?

Cheers,

    Bruno



On Mon, Sep 2, 2013 at 1:59 PM, David Holmes <david.holmes at oracle.com>wrote:

> Hi Bruno,
>
> As you note this is a very old issue. The reason it hasn't become a
> priority to fix was because it didn't actually manifest. In theory it
> should but in practice some "incorrect" clock handling in the kernel made
> everything work okay. Jump forward to now and we have already seen reports
> where this has become a problem on 64-bit but still works okay on 32-bit -
> which is very puzzling as in theory there should be no difference. My own
> thoughts are that something has been "fixed" in the 64-bit linux kernel and
> that this now exposes this issue where previously it did not.
>
> The basic sleep/wait/park with relative timeouts all use the same
> underlying mechanism on linux: pthread_cond_timedwait. This takes an
> absolute time which is currently based on CLOCK_REALTIME. So in theory if
> the clock is set forward the waits will complete earlier; and if set back
> they will complete later. But note this is not what was observed in
> practice.
>
> The fix is quite straight-forward, assuming the kernel does the right
> thing - and that is to use pthread_cond_t associated with CLOCK_MONOTONIC,
> or even better CLOCK_MONOTONIC_RAW.
>
> But there is a complexity in the park code because that API allows both
> relative and absolute timeouts and for the absolute case we would have to
> use a different condition variable to wait on (one using CLOCK_REALTIME as
> it should be affected by changes to the clock!).
>
> I can raise the priority of this but a fix for 8 may not be feasible given
> the current state of things.
>
> David Holmes
>
>
> On 2/09/2013 9:41 PM, bruno bossola wrote:
>
>> Hi all,
>>
>> I am posting here after few message exchange on the LJC mailing list,
>> from the 7u lead:
>>
>> ===================
>> Looks like an old/known issue. I've seen varying reports around whether
>> this is a linux kernel issue or jvm issue.
>> I'd suggest that Bruno follows up with a question on the
>> hotspot-runtime-dev at openjdk.**java.net<hotspot-runtime-dev at openjdk.java.net>
>> <mailto:hotspot-runtime-dev@**openjdk.java.net<hotspot-runtime-dev at openjdk.java.net>>
>> mailing list [...]
>>
>> ====================
>>
>> In these days my teams are hitting a bug on the JVM 64bit on Linux
>> 64bit: "...there is bug in JVM for overall scheduling during Sytem time
>> changes backward, which also impacts very basic Object.wait &
>> Thread.sleep methods. It becomes too risky to keep Java App running when
>> system time switches back by even certain seconds. You never know what
>> your Java App will end up to." (source: stackoverflow.com
>> <http://stackoverflow.com>)
>>
>>
>> These are some of the consequences:
>> http://bugs.sun.com/**bugdatabase/view_bug.do?bug_**id=7139684<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7139684>
>> http://bugs.sun.com/**bugdatabase/view_bug.do?bug_**id=6311057<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6311057>
>> :
>> http://bugs.sun.com/**bugdatabase/view_bug.do?bug_**id=7139684<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7139684>
>>
>> The original bug is private, but I was told it's a P4 that unfortunately
>> it's not looked after and gets simply shifted from this release to the
>> next one
>> http://bugs.sun.com/**bugdatabase/view_bug.do?bug_**id=6900441<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6900441>
>>
>> See also here for a stackoverflow drill:
>> http://stackoverflow.com/**questions/9044423/java-**
>> scheduler-which-is-completely-**independent-of-system-time-**changes<http://stackoverflow.com/questions/9044423/java-scheduler-which-is-completely-independent-of-system-time-changes>
>>
>> Such bug is NOT fixed in the latest JVM, so the recommended  course of
>> action is to restart the VM if a bit time jump happens (on small jumps
>> the JVM will catch up). This is consistently happening on a 64bitvm when
>> used on a 64bit linux system, regardless of the monotonicity of the
>> underlying OS (at least apparently).
>>
>> Note that this should not happen for primitives such as
>> System.nanoTime() (like the queue used internally for ScheduledExecutor)
>> that should work correctly in presence of a monotonic system:
>>
>> jlong os::javaTimeNanos() {
>>    if (Linux::supports_monotonic_**clock()) {
>>      struct timespec tp;
>>      int status = Linux::clock_gettime(CLOCK_**MONOTONIC, &tp);
>>      assert(status == 0, "gettime error");
>>      jlong result = jlong(tp.tv_sec) * (1000 * 1000 * 1000) +
>> jlong(tp.tv_nsec);
>>      return result;
>>    } else {
>>      timeval time;
>>      int status = gettimeofday(&time, NULL);
>>      assert(status != -1, "linux error");
>>      jlong usecs = jlong(time.tv_sec) * (1000 * 1000) +
>> jlong(time.tv_usec);
>>      return 1000 * usecs;
>>    }
>> }
>>
>> Unfortunately, for some reasons, this is not the case on 1.6+ 64bitVM on
>> 64bitLinux. Furthermore, to be more clear about the issue, the extent of
>> it and the concurrency library, let me introduce this very simple program:
>>
>> import java.util.concurrent.locks.**LockSupport;
>>
>> public class Main {
>>
>>      public static void main(String[] args) {
>>
>>          for (int i=100; i>0; i--) {
>>              System.out.println(i);
>>              LockSupport.parkNanos(1000L***1000L*1000L);
>>          }
>>
>>          System.out.println("Done!");
>>      }
>> }
>>
>> While running it with a 64bit 1.6+ JVM on 64bit Linux, turn the clock
>> down one hour and wait until the counter stops... magic!  I tested this
>> on JDK6, JDK7 and latest JDK8 beta running on various Ubuntu distros.
>> It's not just a matter of (old?) sleep() and wait() primitives, it also
>> affects the new concurrency library. Please note that classic sleep()
>> works correctly on JDK1.4: it qualifies this bug as a regression to me,
>> and the fact that it's there since at least 7 years kind of troubles me.
>>
>> This is something we cannot easily manage as our software is installed
>> on-premises to our customers, hence we have no control at all about time
>> changes: if our application hangs, we are pretty much in big trouble.
>>
>> I'd really like to get your view on the matter.
>>
>> Thanks in advance,
>>
>>      Bruno
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130902/6ebbd73c/attachment-0001.html 


More information about the hotspot-runtime-dev mailing list