Leap Second Kernel issue (6/30/2012) and JDK-6900441

Mon Apr 20 02:59:16 UTC 2015

Hi Andrew,

I've attempted to answer this email a few times now but eventually 
realized there is not enough information to really understand what you 
have observed. You refer to the fix for 6311057

https://bugs.openjdk.java.net/browse/JDK-6311057

as being responsible for use of CLOCK_MONOTONIC, but it wasn't - that 
was 6900441 as per your subject line. So I'm unclear when you really 
mean 6311057 and when you mean 6900441 - but 6311057 was fixed way back 
in early JDK 6 so would never be seen in 7 or later.

There are three parts to this issue:
- how the OS/kernel handles changes to the TOD for pthread_condtimedwait 
with CLOCK_REALTIME
- how the "time" read by javaTimeMillis/javaTimeNanos was affected by 
changes to the TOD
- how the VM logic responded to early returns from the OS/kernel 
pthread_condtimedwait (ie whether it read the elapsed time - and how - 
and whether it calculated an updated time-to-wait).

Only some timed-waiting APIs (eg Thread.sleep) prevent early returns; 
for others (eg. Object.wait, ParkSupport.parkNanos) it is just 
considered a "spurious wakeup" which is permitted by the spec.

There's no simple story to explain all the possibilities as I certainly 
can't tell you exactly which kernel behaved in what way. I tried to 
summarise the situation in my 2009/11/11 comment in 6311057. The fix for 
6900441 finally made the change to using CLOCK_MONOTONIC for the 
pthread_cond_t variables for relative waits (which makes them almost 
immune to changes in the TOD).

The fix for 8037340 reverted the code back to calculating the 
time-to-wait based on CLOCK_REALTIME as that is the only clock the 
platform/OS "semaphore" uses. This has been inadvertently changed by the 
fix for 6900441. There's only one use of that Semaphore the sr_semaphore 
used for a simple suspend/resume mechanism. It does a 2ns timed-wait. 
This is susceptible to changes in the TOD: if time goes forward we will 
return prematurely; if time goes back, we will wait longer. But for 
small changes, like a second, the effect would be benign.

For systems with 6311057 fixed I'm at a loss to explain the high CPU 
utilization issue. Even if the TOD changes forward causing early returns 
from the timed-waits, the logic in most cases would re-try the wait with 
a re-calculated time and the second wait would not be affected by the 
earlier TOD change. If the TOD changes backwards, as with a leap-second, 
then potentially timed-waits will be a second longer (assuming we're not 
using CLOCK_MONOTONIC and depending on kernel behaviour) but that would 
not cause CPU utilization spikes - quite the opposite (unless you 
observe threads going into busy-wait loops polling the threads that are 
now delayed longer than expected ?).

Cheers,
David

On 18/04/2015 1:57 AM, Tolbert, Andrew wrote:
> DISCLAIMER: I fully acknowledge that this is not a jvm issue and I
> know well that the real solution to this problem comes in the form a
> kernel-level fix, but I found this interesting and was hoping to get
> some validation of my thoughts.   I also do not have much familiarity
> with hotspot or JVM internals, so I apologize if I've said anything
> wrong here or am not following the correct procedures.
>
> I've been doing some investigation pending the leap second that will
> be added on June 30, 2015 at 23:59:59 UTC
> (https://hpiers.obspm.fr/eoppc/bul/bulc/bulletinc.49).  Back on June
> 30, 2012 a bunch of services were impacted by the notorious leap
> second bug in the linux kernel
> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1020285).   Some
> of these applications were java-based (cassandra, hadoop,
> elasticsearch, etc.).
>
> To get a better understanding of the issue, I ran a test on an old
> ubuntu install running 10.10 and kernel version 2.6.35-32.    I set up
> a test with a completely idle cassandra 1.2.19 instance using jdk7u75
> and used John Stultz's leap-a-day
> (https://github.com/johnstultz-work/timetests/blob/master/leap-a-day.c)
> program to set the time near midnight and inject a leap second at
> 23:59:59.  What I found surprised me a little bit,  I was not able to
> manifest the issue that I had previously experienced in 2012!
>
> That got me thinking, maybe there was some particular change in the
> JVM to accommodate for this issue?  I went back to JDK6u32 (version I
> believe I was running back in 2012) and reran my test, and my cpu
> utilization pegged out at exactly midnight!
>
> I then tried upgrading to the latest JDK6 version (u45) and also could
> reproduce the issue.  I tried an early arbitrary version of JDK7 and
> the issue also manifested.  I then went through my test with various
> JDK updates and found that the earliest release where I could not
> reproduce the issue was 7u60.
>
> Looking through the bug fix list for u60
> (http://www.oracle.com/technetwork/java/javase/2col/7u60-bugfixes-2202029.html),
> I found two fixes around time that seemed interesting:
>
> - JDK-6311057: Java Thread.sleep(timeout) is influenced by changes to
> System time on Linux
> (https://bugs.openjdk.java.net/browse/JDK-6311057)
> - JDK-8037340: Linux semaphores to use CLOCK_REALTIME
> (https://bugs.openjdk.java.net/browse/JDK-8037340)
>
> JDK-6311057 switches to use CLOCK_MONOTONIC instead of gettimeofday in
> compute_abstime
> (http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/rev/fd506dce728a?revcount=240)
>
> JDK-8037340 was a change made in response to JDK-6311057 to use
> CLOCK_REALTIME in Semaphore::timedWait
> (http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/rev/3031c29f30b5?revcount=240)
>
> My theory was that JDK-6311057 causes the jvm to behave correctly
> during the leap second bug because it uses monotonic time instead of
> gettimeofday/CLOCK_REALTIME, which will be susceptible to the leap
> second kernel bug, and code that does things like sleep/park in a loop
> for less than a second until a condition is met basically enters a
> busy loop and consumes cpu.
>
> I backed out JDK-6311057 at jdk7u60 and built openjdk and reran my
> test low and behold, as soon as the leap second is inserted my
> cassandra jvm consumes 100% cpu.
>
> So what I'm hoping to get out of this is:
>
> 1. Does my line of reasoning seem correct?  Is it JDK-6311057 that is
> likely protecting my JVM from entering a busy loop and needlessly
> consuming cpu?
> 2. As JDK-8037340 maintains that Semaphore::timedWait uses
> CLOCK_REALTIME, could I anticipate any possible issues involving leap
> second?   I was trying to think of a scenario that would exercise this
> logic during a leap second, but I am not familiar enough with hotspot
> and how java code would map to this to try to exercise it during a
> leap second.  I tried a program that in effect does
> while(!sem.tryAcquire(1, TimeUnit.microseconds)), but it behaves
> normally after a leap second with the kernel issue.
>
> Thank you for your time!
>
> Regards,
> Andy
>