RFR: 6900441 PlatformEvent.park(millis) on Linux could still be affected by changes to the time-of-day clock

David Holmes david.holmes at oracle.com
Wed Sep 11 04:06:29 PDT 2013


webrev:

http://cr.openjdk.java.net/~dholmes/6900441/webrev/

Short version: use CLOCK_MONOTONIC with pthread_cond_t objects for 
relative timed-waits.

Long version - see below :)

Thanks,
David
-----

Background: relative timed-waits (Thread.sleep, Object.wait, 
LockSupport.parkNanos) should not be affected by changes to the 
time-of-day clock. The linux implementation uses pthread_cond_timedwait 
on default initialized pthread_cond_t objects. pthread_cond_timedwait 
actually takes an absolute time and the default clock is CLOCK_REALTIME 
(which is a time-of-day clock). You would expect then that changing the 
time-of-day would impact these supposedly relative timed waits: time 
going forward would cause early returns; time going backwards would 
cause extended delays. But for many years it did not - the reason being 
that the glibc/pthreads implementors had not implemented it to work that 
way. Originally the monotonic clock didn't even exist so there was only 
one way this could be implemented (and we had LinuxThreads vs NPTL etc 
etc - lots of history).

Skip forward to 2009 and glibc was modified to correct this behaviour - 
but only on 64-bit linux (no idea why - the 32-bit variant was added 
earlier this year but I don't know what glibc version that corresponds 
to). For 64-bit this was, as I understand it, glibc 2.12. By coincidence 
in 2009 I filed 6900441 as I knew we would have to change the 
implementation when glibc was fixed - unfortunately I wasn't aware that 
it actually had been fixed at that time. The bug was slated for future 
work, I went on to other things, blah blah blah ...

We didn't get a flurry of bug reports in 2009, nor 2010, 2011 ... why 
not? Partly because people tend to avoid large jumps in system time and 
this bug only becomes noticeable when "large" backward jumps occur and 
threads 'hang'. (Forward jumps cause early returns that are either 
filtered out or permitted as spurious-wakeups). And partly it seems 
because it took time for glibc 2.12 to get into some Linux distributions 
and for people to take up the new version.

Skip forward to today and we are starting to see a number of reports 
about this problem as people are now using the new glibc (and may have 
been for a while) and also seeing time changes have unexpected impact on 
their programs. So this needs to be fixed ASAP before it becomes a major 
problem and will be backported to OpenJDK 7 and 6.

The fix is relatively straight-forward: the pthread_cond_t objects used 
for the relative timed-waits have to be associated with CLOCK_MONOTONIC 
so that they are not affected by changes to the time-of-day clock. There 
is a slight complication in that the LockSupport.park API supports both 
a relative and absolute timed-wait so we need two different 
pthread_cond_t each associated with different clocks.

Notes:

1. This is a linux fix only. I don't know if we also have the problem on 
other OS but it hasn't been flagged and while I will check, it is more 
important to get this out for Linux ASAP.

2. Given the late stage of JDK 8 release cycle (to minimize risk), and 
to ease backporting to 6 and 7, I made no attempt to do any kind of code 
clean up here. This code is full of historical anachronisms and for Java 
9 I hope to see it all cleaned up, but for now all the baggage and 
duplication must remain as-is.

3. We can obviously only fix this if we have a monotonic clock hence 
that has to be used to guard the new code. These days it would be 
extremely rare to not have the monotonic clock but I still use the guard.

4. CLOCK_MONOTONIC is not in fact completely immune to changes in the 
time of day clock but it won't jump backwards. The new clock on the 
block is CLOCK_MONOTONIC_RAW which should always advance at a constant 
rate with no jumps. We have a RFE to start using CLOCK_MONOTONIC_RAW for 
System.nanoTime(), and we would use it for the pthread_cond_t too, but 
we can't use that until the JDK 8 build platform is updated to a linux 
version that actually has that clock at build time. That update is very 
near but not yet here so we stay with CLOCK_MONOTONIC.


More information about the hotspot-dev mailing list