Safepointing & Locking issue

Thu Jul 8 06:12:51 PDT 2010

Hi David and Dave,

thanks a lot for your hints and background details on this issue.

Adding -XX:+PrintSafepointStatistics to the options as suggested by David gives
about 1200+ DeoptimizeFrame's per second in the LogFile:

         vmop                    [threads: total initially_running
wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
..
168.976: DeoptimizeFrame                  [     217          0              0  
 ]      [     0     0     0     0     0    ]  0
168.976: DeoptimizeFrame                  [     217          0              0  
 ]      [     0     0     0     0     0    ]  0
168.978: DeoptimizeFrame                  [     217          0              0  
 ]      [     0     0     0     0     0    ]  0
168.978: DeoptimizeFrame                  [     217          0              0  
 ]      [     0     0     0     0     0    ]  0
168.978: DeoptimizeFrame                  [     217          0              0  
 ]      [     0     0     0     0     0    ]  0
..

The output from the various switches (as also -XX:+LogCompilation) are a bit
interleaved, but at this point in time (second 168ff) there are no other
compilation related outputs around. So is there any way to further find out why
a frame is being deoptimized? This would then lead so a safepoint, right?

Oh, and yes, this JVM does have a JVMTI agent attached to it. But as far as I
could find out until now, it should only register for GC start/end events (which
are very rare in this process).

With DTrace I just saw -- thanks to the really cool hotspot probes (some of them
seem to be new, it's the first time I see that great level of detail!) -- that
there are about 1000 Exceptions occurring per second ("ExceptionOccurred-entry")
(but less than one "ThrowNew-entry" per second). I'm not sure how ThrowNew
relates to ExceptionOccurred and what they are actually counting, and whether
1000+ of ExceptionOccurred is "a lot" (it seems quite high ;-) ). Could
exceptions also cause deoptimization?

Unfortunately we don't have a stand-alone easy to reproduce testcase for this.

These are all the _sync_* that I get:

sun.rt._sync_ContendedLockAttempts = 1625
sun.rt._sync_Deflations = 66821
sun.rt._sync_EmptyNotifications = 0
sun.rt._sync_FailedSpins = 0
sun.rt._sync_FutileWakeups = 56
sun.rt._sync_Inflations = 66870
sun.rt._sync_MonExtant = 6784
sun.rt._sync_MonInCirculation = 0
sun.rt._sync_MonScavenged = 0
sun.rt._sync_Notifications = 116109
sun.rt._sync_Parks = 117295
sun.rt._sync_PrivateA = 0
sun.rt._sync_PrivateB = 0
sun.rt._sync_SlowEnter = 0
sun.rt._sync_SlowExit = 0
sun.rt._sync_SlowNotify = 0
sun.rt._sync_SlowNotifyAll = 0
sun.rt._sync_SuccessfulSpins = 0

Thanks for pointing me to synchronizer.cpp, this sounds worth reading your
comments there!

Thanks,
Nick.

> -----Original Message-----
> From: ext Dave Dice [mailto:dave.dice at oracle.com] 
> Sent: Donnerstag, 8. Juli 2010 13:57
> To: Nicolas Michael
> Cc: hotspot-runtime-dev at openjdk.java.net
> Subject: Re: Safepointing & Locking issue
> 
> 
> On 2010-7-8, at 3:21 AM, Nicolas Michael wrote:
> 
> > Hi all,
> > 
> > I have yesterday been investigating a performance problem 
> with one of our
> > applications where I observed some "strange" JVM behavior 
> (which might be as
> > expected, but I'm not sure...).
> > 
> > It's a heavily multi-threaded Java application with about 
> 300 threads, running
> > on a 4 CPU Solaris x86 box using a 32bit HotSpot Client VM 
> build 16.3-b01 (JDK
> > 1.6.0_20-b02). It seems to make extensive use of 
> synchronization and suffers
> > badly from lock contention. Thread #10 (the VM Thread) eats 
> close to one CPU,
> > mostly doing lock deflations during safepoints in stacks like this:
> > 
> >              libjvm.so`__1cNObjectMonitorHis_busy6kM_i_+0x26
> >              
> libjvm.so`__1cSObjectSynchronizerVdeflate_idle_monitors6F_v_+0x210
> >              
> libjvm.so`__1cUSafepointSynchronizeQdo_cleanup_tasks6F_v_+0x19
> >              libjvm.so`__1cUSafepointSynchronizeFbegin6F_v_+0x3d0
> >              libjvm.so`__1cIVMThreadEloop6M_v_+0x26c
> >              libjvm.so`__1cIVMThreadDrun6M_v_+0x88
> >              libjvm.so`java_start+0xf9
> >              libc.so.1`_thr_setup+0x4e
> >              libc.so.1`_lwp_start
> >              287
> > 
> >> From safepointing point-of-view, the process isn't doing 
> much useful work -- it
> > only got short periods of work interrupted by safepoints 
> that take longer than
> > the actual application runtime -- at a rate of 1709 
> safepoints per second (see
> > below):
> > 
> > Application time: 0.0000450 seconds
> > Total time for which application threads were stopped: 
> 0.0002397 seconds
> > Application time: 0.0001607 seconds
> > Total time for which application threads were stopped: 
> 0.0002603 seconds
> > Application time: 0.0000469 seconds
> > Total time for which application threads were stopped: 
> 0.0011530 seconds
> > ..
> > 
> >> From the perfdata statistics, I see
> > sun.rt._sync_ContendedLockAttempts    increasing by   12 / sec
> > sun.rt._sync_Deflations               increasing by  477 / sec
> > sun.rt._sync_Inflations               increasing by  477 / sec
> > sun.rt._sync_Notifications            increasing by  791 / sec
> > sun.rt._sync_Parks                    increasing by  795 / sec
> > sun.rt.safepoints                     increasing by 1709 / sec
> > sun.rt.safepointTime is 0.82s and sun.rt.safepointSyncTime 
> 0.15s per second.
> > 
> > I guess we're using too much synchronization, 
> Object.wait(ms) and notify() (or
> > even notifyAll()) ... ;-) but I'm wondering whether it was 
> "normal" to have that
> > many safepoints and lock inflations/deflations because of that?
> > 
> > My understanding was that safepoints are mainly caused by 
> GC (not the case here)
> > or deoptimization (should be visible with +LogCompilation, 
> right? but nothing
> > there either). Or bias revocation. Indeed, we do have some 
> bias revocations
> > (like this, see below), but this is at a much lower rate 
> (only ~ 3.4 times per
> > second):
> 
> Hi Nick,
> 
> Do you have test case that's easily packaged up?
> 
> Inflation can be induced by a number of behaviors, such as 
> wait(), contention,  deoptimization,  bias revocation, JNI 
> monitor access,  some hashCode actions, etc.   By default, 
> however, we don't trigger stop-the-world safepoints to 
> deflate.   Absent safepoints, the ultimate bound on the 
> number of objectmonitors in play is just the number of live 
> and dead objects in the heap.    That is, there's no natural 
> back-pressure on the number of monitors.   If the number of 
> monitors becomes too large the JVM can spend inordinate 
> amounts of time  in deflate_idle_monitors() scanning, trying 
> to scavenge and deflate monitors, so there's both a time 
> problem and space problem (arguably the space problem drives 
> the time problem, but they manifest differently for customers).     
> 
> There's a day-one assumption in various pieces of code that 
> the object:monitor relationship is stable except over 
> potential safepoints.   That simplifies many operations but 
> is the source of the issue you're seeing.   Ultimately, it's 
> probably better to deflate on the fly and decouple safepoints 
> from deflation.  (The old ExactVM implementation did just 
> that).   It's relatively straight-forward, but touches a good 
> deal of code.   If you're curious I've left more detailed 
> comments in synchronizer.cpp.   
> 
> In the mean time there are some palliative approaches under 
> consideration to (a) induce STW safepoints to bound the 
> number of extant monitors and in turn bound the cost of 
> deflate_idle_monitors, and (b) partition the monitors so we 
> only need scan those that were possibly inflated in the last 
> STW epoch.    We also have code percolating through the 
> system that might cut down on some of the cases of inflation, 
> which is why a test case would be of interest -- to see if 
> the specific case you're running into might already be addressed.   
> 
> I vaguely recall adding a perfdata counter that reflects the 
> number of objectmonitors in the system.   Where there any 
> other rt._sync_* variables visible in perfdata?
> 
> Regards
> Dave
> 
> 
> 
> > 
> > <writer thread='10'/>
> > Revoking bias with potentially per-thread safepoint:
> > Revoking bias of object 0x6df887d0 , mark 0x08ee3805 , type
> > org.jacorb.poa.RequestQueue , prototype header 0x00000005 , 
> allow rebias 0 ,
> > requesting thread 0x09065800
> >  Revoked bias of currently-unlocked object
> > <writer thread='149'/>
> > Revoking bias by walking my own stack:
> > Revoking bias of object 0x6df887c8 , mark 0x09065805 , type 
> java.lang.Object ,
> > prototype header 0x00000005 , allow rebias 0 , requesting 
> thread 0x09065800
> >  Revoked bias of currently-locked object
> > Inflating object 0x6df887c8 , mark 0x0906042a , type 
> java.lang.Object
> > 
> > Also, disabling biased locking has no effect on the 
> magnitude of safepointing,
> > so I concluded that safepointing must then be caused by 
> lock inflation/deflation
> > (which seems to be supported by the stack above).
> > 
> > Afaik a lock is being inflated when the leight-weight 
> locking algorithm can't
> > acquire the lock immediately ("contention"). It will then 
> fall back to
> > heavy-weight locking (OS mutex?) and notify (wake up) all 
> waiting threads (?).
> > We have about 40 lock inflations per contended lock attempt 
> (see the perfdata
> > statistics above), does that mean there were (on average) 
> 40 threads waiting on
> > that contended lock?
> > 
> > Since we have the same amount of deflations as we have 
> inflations, I would
> > assume that each heavy-weight lock is shortly after being 
> "contended" reverted
> > to a leight-weight lock again (during a safepoint in 
> deflate_idle_monitors() as
> > in the stack above??). But why? Couldn't it stay as a 
> heavy-weight lock, instead
> > of being deflated, only to be inflated shortly after again?
> > 
> > In the example below, lock 0x96568a60 is being inflated, 
> deflated, inflated
> > again and then deflated again:
> > 
> > <writer thread='143'/>
> > Inflating object 0x96568a60 , mark 0x08677ffe , type 
> org.jacorb.poa.RequestProcessor
> > <writer thread='10'/>
> > Deflating object 0x96568a60 , mark 0x08677ffe , type 
> org.jacorb.poa.RequestProcessor
> > <writer thread='143'/>
> > Inflating object 0x96568a60 , mark 0x0918f4fe , type 
> org.jacorb.poa.RequestProcessor
> > <writer thread='240'/>
> > Inflating object 0x79bceee8 , mark 0x08c0b662 , type 
> java.lang.Object
> > <writer thread='243'/>
> > Inflating object 0x79bceeb0 , mark 0x0914e9ee , type 
> org.jacorb.orb.ReplyReceiver
> > <writer thread='10'/>
> > Deflating object 0x79bceee8 , mark 0x08c0b662 , type 
> java.lang.Object
> > Deflating object 0x79bceeb0 , mark 0x0914e9ee , type 
> org.jacorb.orb.ReplyReceiver
> > Deflating object 0x96568a60 , mark 0x0918f4fe , type 
> org.jacorb.poa.RequestProcessor
> > 
> > I was just curious whether this is the intended behavior in 
> case of lock
> > contention, or whether we're seing some pathological 
> strange thing happen here.
> > 
> > Btw, I've also ran the workload with Hotspot 19.0-b03 from 
> the latest JDK 7
> > build with the same results.
> > 
> > Any comments appreciated!
> > 
> > Thanks,
> > Nick.
> > 
> 
>