Safepointing & Locking issue

Thu Jul 8 08:06:29 PDT 2010

Nicolas,

It would also be very helpful if you could possibly gather information  
to determine
why you are running so many safepoints.

Could you possibly try running with the following flags on a fairly  
current jdk7 binary?
-XX:+PrintSafepointStatistics -XX:+TraceSafepointCleanupTime

That should tell you why you are running so many safepoints (which  
would not be caused
by lock inflation/deflation). It will also tell you details of time  
spent in different components
of the safepoint cleanup time, including lock deflation. Note that the  
flags require a development
build, like fastdebug.

thanks,
Karen

On Jul 8, 2010, at 7:56 AM, Dave Dice wrote:

>
> On 2010-7-8, at 3:21 AM, Nicolas Michael wrote:
>
>> Hi all,
>>
>> I have yesterday been investigating a performance problem with one  
>> of our
>> applications where I observed some "strange" JVM behavior (which  
>> might be as
>> expected, but I'm not sure...).
>>
>> It's a heavily multi-threaded Java application with about 300  
>> threads, running
>> on a 4 CPU Solaris x86 box using a 32bit HotSpot Client VM build  
>> 16.3-b01 (JDK
>> 1.6.0_20-b02). It seems to make extensive use of synchronization  
>> and suffers
>> badly from lock contention. Thread #10 (the VM Thread) eats close  
>> to one CPU,
>> mostly doing lock deflations during safepoints in stacks like this:
>>
>>             libjvm.so`__1cNObjectMonitorHis_busy6kM_i_+0x26
>>              
>> libjvm.so`__1cSObjectSynchronizerVdeflate_idle_monitors6F_v_+0x210
>>              
>> libjvm.so`__1cUSafepointSynchronizeQdo_cleanup_tasks6F_v_+0x19
>>             libjvm.so`__1cUSafepointSynchronizeFbegin6F_v_+0x3d0
>>             libjvm.so`__1cIVMThreadEloop6M_v_+0x26c
>>             libjvm.so`__1cIVMThreadDrun6M_v_+0x88
>>             libjvm.so`java_start+0xf9
>>             libc.so.1`_thr_setup+0x4e
>>             libc.so.1`_lwp_start
>>             287
>>
>>> From safepointing point-of-view, the process isn't doing much  
>>> useful work -- it
>> only got short periods of work interrupted by safepoints that take  
>> longer than
>> the actual application runtime -- at a rate of 1709 safepoints per  
>> second (see
>> below):
>>
>> Application time: 0.0000450 seconds
>> Total time for which application threads were stopped: 0.0002397  
>> seconds
>> Application time: 0.0001607 seconds
>> Total time for which application threads were stopped: 0.0002603  
>> seconds
>> Application time: 0.0000469 seconds
>> Total time for which application threads were stopped: 0.0011530  
>> seconds
>> ..
>>
>>> From the perfdata statistics, I see
>> sun.rt._sync_ContendedLockAttempts    increasing by   12 / sec
>> sun.rt._sync_Deflations               increasing by  477 / sec
>> sun.rt._sync_Inflations               increasing by  477 / sec
>> sun.rt._sync_Notifications            increasing by  791 / sec
>> sun.rt._sync_Parks                    increasing by  795 / sec
>> sun.rt.safepoints                     increasing by 1709 / sec
>> sun.rt.safepointTime is 0.82s and sun.rt.safepointSyncTime 0.15s  
>> per second.
>>
>> I guess we're using too much synchronization, Object.wait(ms) and  
>> notify() (or
>> even notifyAll()) ... ;-) but I'm wondering whether it was "normal"  
>> to have that
>> many safepoints and lock inflations/deflations because of that?
>>
>> My understanding was that safepoints are mainly caused by GC (not  
>> the case here)
>> or deoptimization (should be visible with +LogCompilation, right?  
>> but nothing
>> there either). Or bias revocation. Indeed, we do have some bias  
>> revocations
>> (like this, see below), but this is at a much lower rate (only ~  
>> 3.4 times per
>> second):
>
> Hi Nick,
>
> Do you have test case that's easily packaged up?
>
> Inflation can be induced by a number of behaviors, such as wait(),  
> contention,  deoptimization,  bias revocation, JNI monitor access,   
> some hashCode actions, etc.   By default, however, we don't trigger  
> stop-the-world safepoints to deflate.   Absent safepoints, the  
> ultimate bound on the number of objectmonitors in play is just the  
> number of live and dead objects in the heap.    That is, there's no  
> natural back-pressure on the number of monitors.   If the number of  
> monitors becomes too large the JVM can spend inordinate amounts of  
> time  in deflate_idle_monitors() scanning, trying to scavenge and  
> deflate monitors, so there's both a time problem and space problem  
> (arguably the space problem drives the time problem, but they  
> manifest differently for customers).
>
> There's a day-one assumption in various pieces of code that the  
> object:monitor relationship is stable except over potential  
> safepoints.   That simplifies many operations but is the source of  
> the issue you're seeing.   Ultimately, it's probably better to  
> deflate on the fly and decouple safepoints from deflation.  (The old  
> ExactVM implementation did just that).   It's relatively straight- 
> forward, but touches a good deal of code.   If you're curious I've  
> left more detailed comments in synchronizer.cpp.
>
> In the mean time there are some palliative approaches under  
> consideration to (a) induce STW safepoints to bound the number of  
> extant monitors and in turn bound the cost of deflate_idle_monitors,  
> and (b) partition the monitors so we only need scan those that were  
> possibly inflated in the last STW epoch.    We also have code  
> percolating through the system that might cut down on some of the  
> cases of inflation, which is why a test case would be of interest --  
> to see if the specific case you're running into might already be  
> addressed.
>
> I vaguely recall adding a perfdata counter that reflects the number  
> of objectmonitors in the system.   Where there any other rt._sync_*  
> variables visible in perfdata?
>
> Regards
> Dave
>
>
>
>>
>> <writer thread='10'/>
>> Revoking bias with potentially per-thread safepoint:
>> Revoking bias of object 0x6df887d0 , mark 0x08ee3805 , type
>> org.jacorb.poa.RequestQueue , prototype header 0x00000005 , allow  
>> rebias 0 ,
>> requesting thread 0x09065800
>> Revoked bias of currently-unlocked object
>> <writer thread='149'/>
>> Revoking bias by walking my own stack:
>> Revoking bias of object 0x6df887c8 , mark 0x09065805 , type  
>> java.lang.Object ,
>> prototype header 0x00000005 , allow rebias 0 , requesting thread  
>> 0x09065800
>> Revoked bias of currently-locked object
>> Inflating object 0x6df887c8 , mark 0x0906042a , type java.lang.Object
>>
>> Also, disabling biased locking has no effect on the magnitude of  
>> safepointing,
>> so I concluded that safepointing must then be caused by lock  
>> inflation/deflation
>> (which seems to be supported by the stack above).
>>
>> Afaik a lock is being inflated when the leight-weight locking  
>> algorithm can't
>> acquire the lock immediately ("contention"). It will then fall back  
>> to
>> heavy-weight locking (OS mutex?) and notify (wake up) all waiting  
>> threads (?).
>> We have about 40 lock inflations per contended lock attempt (see  
>> the perfdata
>> statistics above), does that mean there were (on average) 40  
>> threads waiting on
>> that contended lock?
>>
>> Since we have the same amount of deflations as we have inflations,  
>> I would
>> assume that each heavy-weight lock is shortly after being  
>> "contended" reverted
>> to a leight-weight lock again (during a safepoint in  
>> deflate_idle_monitors() as
>> in the stack above??). But why? Couldn't it stay as a heavy-weight  
>> lock, instead
>> of being deflated, only to be inflated shortly after again?
>>
>> In the example below, lock 0x96568a60 is being inflated, deflated,  
>> inflated
>> again and then deflated again:
>>
>> <writer thread='143'/>
>> Inflating object 0x96568a60 , mark 0x08677ffe , type  
>> org.jacorb.poa.RequestProcessor
>> <writer thread='10'/>
>> Deflating object 0x96568a60 , mark 0x08677ffe , type  
>> org.jacorb.poa.RequestProcessor
>> <writer thread='143'/>
>> Inflating object 0x96568a60 , mark 0x0918f4fe , type  
>> org.jacorb.poa.RequestProcessor
>> <writer thread='240'/>
>> Inflating object 0x79bceee8 , mark 0x08c0b662 , type java.lang.Object
>> <writer thread='243'/>
>> Inflating object 0x79bceeb0 , mark 0x0914e9ee , type  
>> org.jacorb.orb.ReplyReceiver
>> <writer thread='10'/>
>> Deflating object 0x79bceee8 , mark 0x08c0b662 , type java.lang.Object
>> Deflating object 0x79bceeb0 , mark 0x0914e9ee , type  
>> org.jacorb.orb.ReplyReceiver
>> Deflating object 0x96568a60 , mark 0x0918f4fe , type  
>> org.jacorb.poa.RequestProcessor
>>
>> I was just curious whether this is the intended behavior in case of  
>> lock
>> contention, or whether we're seing some pathological strange thing  
>> happen here.
>>
>> Btw, I've also ran the workload with Hotspot 19.0-b03 from the  
>> latest JDK 7
>> build with the same results.
>>
>> Any comments appreciated!
>>
>> Thanks,
>> Nick.
>>
>