RFR: 6900441 PlatformEvent.park(millis) on Linux could still be affected by changes to the time-of-day clock
David Holmes
david.holmes at oracle.com
Wed Sep 11 04:06:29 PDT 2013
webrev:
http://cr.openjdk.java.net/~dholmes/6900441/webrev/
Short version: use CLOCK_MONOTONIC with pthread_cond_t objects for
relative timed-waits.
Long version - see below :)
Thanks,
David
-----
Background: relative timed-waits (Thread.sleep, Object.wait,
LockSupport.parkNanos) should not be affected by changes to the
time-of-day clock. The linux implementation uses pthread_cond_timedwait
on default initialized pthread_cond_t objects. pthread_cond_timedwait
actually takes an absolute time and the default clock is CLOCK_REALTIME
(which is a time-of-day clock). You would expect then that changing the
time-of-day would impact these supposedly relative timed waits: time
going forward would cause early returns; time going backwards would
cause extended delays. But for many years it did not - the reason being
that the glibc/pthreads implementors had not implemented it to work that
way. Originally the monotonic clock didn't even exist so there was only
one way this could be implemented (and we had LinuxThreads vs NPTL etc
etc - lots of history).
Skip forward to 2009 and glibc was modified to correct this behaviour -
but only on 64-bit linux (no idea why - the 32-bit variant was added
earlier this year but I don't know what glibc version that corresponds
to). For 64-bit this was, as I understand it, glibc 2.12. By coincidence
in 2009 I filed 6900441 as I knew we would have to change the
implementation when glibc was fixed - unfortunately I wasn't aware that
it actually had been fixed at that time. The bug was slated for future
work, I went on to other things, blah blah blah ...
We didn't get a flurry of bug reports in 2009, nor 2010, 2011 ... why
not? Partly because people tend to avoid large jumps in system time and
this bug only becomes noticeable when "large" backward jumps occur and
threads 'hang'. (Forward jumps cause early returns that are either
filtered out or permitted as spurious-wakeups). And partly it seems
because it took time for glibc 2.12 to get into some Linux distributions
and for people to take up the new version.
Skip forward to today and we are starting to see a number of reports
about this problem as people are now using the new glibc (and may have
been for a while) and also seeing time changes have unexpected impact on
their programs. So this needs to be fixed ASAP before it becomes a major
problem and will be backported to OpenJDK 7 and 6.
The fix is relatively straight-forward: the pthread_cond_t objects used
for the relative timed-waits have to be associated with CLOCK_MONOTONIC
so that they are not affected by changes to the time-of-day clock. There
is a slight complication in that the LockSupport.park API supports both
a relative and absolute timed-wait so we need two different
pthread_cond_t each associated with different clocks.
Notes:
1. This is a linux fix only. I don't know if we also have the problem on
other OS but it hasn't been flagged and while I will check, it is more
important to get this out for Linux ASAP.
2. Given the late stage of JDK 8 release cycle (to minimize risk), and
to ease backporting to 6 and 7, I made no attempt to do any kind of code
clean up here. This code is full of historical anachronisms and for Java
9 I hope to see it all cleaned up, but for now all the baggage and
duplication must remain as-is.
3. We can obviously only fix this if we have a monotonic clock hence
that has to be used to guard the new code. These days it would be
extremely rare to not have the monotonic clock but I still use the guard.
4. CLOCK_MONOTONIC is not in fact completely immune to changes in the
time of day clock but it won't jump backwards. The new clock on the
block is CLOCK_MONOTONIC_RAW which should always advance at a constant
rate with no jumps. We have a RFE to start using CLOCK_MONOTONIC_RAW for
System.nanoTime(), and we would use it for the pthread_cond_t too, but
we can't use that until the JDK 8 build platform is updated to a linux
version that actually has that clock at build time. That update is very
near but not yet here so we stay with CLOCK_MONOTONIC.
More information about the hotspot-dev
mailing list