JVM64bit running on Linux 64bit: when system time changes the JVM may hang (bug_id=6900441)

Dmitry Samersoff dmitry.samersoff at oracle.com
Tue Sep 3 01:20:25 PDT 2013


David,

> There's a lot of history here.

Yes. the problem is as old as *NIX it self. I have a twenty years old
patch for ntpd to don't set clock backward.

In some cases it's possible to workaround the problem on client side by
using adjtimex

http://www.starling-software.com/en/blog/sysadmin/2010/04/04.making-ntp-work-on-hardware-with-large-clock-drift.html

-Dmitry

On 2013-09-03 02:04, David Holmes wrote:
> There's a lot of history here. In my view of relative and absolute times
> the way the kernel worked in the past was buggy. I know some shared that
> view. But there was also a lot of contention about how changes to the
> system clock should, or should not affect, existing timer queues. And
> this has been true on Solaris as well as linux. Hence in my view of the
> theory this should never have worked well, but the kernel implementors
> had a different view and so it did in fact work till recently (and I
> don't know what exactly has changed).
> 
> More inline ...
> 
> On 3/09/2013 5:24 AM, Andrew Haley wrote:
>> On 09/02/2013 04:44 PM, bruno bossola wrote:
>>
>>> Thanks for your answer.  That's probably correct in the sense this is
>>> what
>>> the code is doing :) I won't discuss that at all.
>>>
>>> However pthread_cond_timedwait uses by default, as clock id,
>>> CLOCK_REALTIME
>>> which, unfortunately is affected by settime()/settimeofday() calls (on
>>> Linux): for that reason it cannot be used to measure nanoseconds delays,
>>> which is what the specification requires. CLOCK_REALTIME is not
>>> guaranteed
>>> to monotonically count as this is the actual "system time": each time my
>>> system syncs time using a NTP server on the net, the time might jump
>>> forward or backward. The correct call (again on Linux)  would require to
>>> use CLOCK_MONOTONIC as clock id.
>>>
>>> Your point would be that, based on Posix specs, setting the value of the
>>> CLOCK_REALTIME clock using clock_settime() should have no effect on
>>> threads
>>> that are waiting. Is that correct?
>>
>> The first thing I had to do was understand what was going on.
>>
>> Secondly, what could be done to fix it?  pthread_cond_timedwait() uses
>> the real-time clock.  That's not just by default: as far as I can see
>> there's no way to make it use any other clock.  So, fixing this
>> problem requires some reengineering, not just replacing CLOCK_REALTIME
>> with CLOCK_MONOTONIC.
> 
> You can associate any available clock with a pthread_cond_t by using a
> pthread_condattr_t on which  pthread_condattr_setclock has been called.
> 
>> Finally, I had to ask "Is this an important bug?"  I don't think it
>> is, and David Holmes allowed been much more than I would have done.
>> If you make the time go backwards in a big jump, all manner of things
>> in a Unix system will go wrong.  In particular, timestaps on files
>> will be affected, and anything that relies on such things will be
>> affected too.  Anyone with root privileges should know this.
>>
>> The right answer in the short term is to use NTP at boot, so that
>> systems are not provoked in this way.
> 
> I strongly believe in two distinct notions of time: relative and
> absolute. Any API involving relative time should never be affected by
> changes to the absolute value of a clock; and anything involving
> absolute times obviously should. Unfortunately not everyone agrees with
> this simple model and the confusion surrounding absolute vs relative
> time APIs has existed pretty much from day one and it is a mess on all
> the OS I've worked on. As a result the practical advice is as you say -
> don't mess with the system time without expecting problems.
> 
> That said, if we can make things play nicely together then we should.
> The timer/clock management code in the VM has gone very stale over the
> years and there are lots of known issues (Windows has its own set of
> problems!). When I worked in runtime back in 2007 this big cleanup was
> on my plate. But I moved to real-time and this work languished - as it
> was working okay it was never a high priority fix.
> 
> David
> -----
> 
>> Andrew.
>>
>>


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the source code.


More information about the hotspot-runtime-dev mailing list