Safepointing & Locking issue
dave.dice at oracle.com
Thu Jul 8 04:56:34 PDT 2010
On 2010-7-8, at 3:21 AM, Nicolas Michael wrote:
> Hi all,
> I have yesterday been investigating a performance problem with one of our
> applications where I observed some "strange" JVM behavior (which might be as
> expected, but I'm not sure...).
> It's a heavily multi-threaded Java application with about 300 threads, running
> on a 4 CPU Solaris x86 box using a 32bit HotSpot Client VM build 16.3-b01 (JDK
> 1.6.0_20-b02). It seems to make extensive use of synchronization and suffers
> badly from lock contention. Thread #10 (the VM Thread) eats close to one CPU,
> mostly doing lock deflations during safepoints in stacks like this:
>> From safepointing point-of-view, the process isn't doing much useful work -- it
> only got short periods of work interrupted by safepoints that take longer than
> the actual application runtime -- at a rate of 1709 safepoints per second (see
> Application time: 0.0000450 seconds
> Total time for which application threads were stopped: 0.0002397 seconds
> Application time: 0.0001607 seconds
> Total time for which application threads were stopped: 0.0002603 seconds
> Application time: 0.0000469 seconds
> Total time for which application threads were stopped: 0.0011530 seconds
>> From the perfdata statistics, I see
> sun.rt._sync_ContendedLockAttempts increasing by 12 / sec
> sun.rt._sync_Deflations increasing by 477 / sec
> sun.rt._sync_Inflations increasing by 477 / sec
> sun.rt._sync_Notifications increasing by 791 / sec
> sun.rt._sync_Parks increasing by 795 / sec
> sun.rt.safepoints increasing by 1709 / sec
> sun.rt.safepointTime is 0.82s and sun.rt.safepointSyncTime 0.15s per second.
> I guess we're using too much synchronization, Object.wait(ms) and notify() (or
> even notifyAll()) ... ;-) but I'm wondering whether it was "normal" to have that
> many safepoints and lock inflations/deflations because of that?
> My understanding was that safepoints are mainly caused by GC (not the case here)
> or deoptimization (should be visible with +LogCompilation, right? but nothing
> there either). Or bias revocation. Indeed, we do have some bias revocations
> (like this, see below), but this is at a much lower rate (only ~ 3.4 times per
Do you have test case that's easily packaged up?
Inflation can be induced by a number of behaviors, such as wait(), contention, deoptimization, bias revocation, JNI monitor access, some hashCode actions, etc. By default, however, we don't trigger stop-the-world safepoints to deflate. Absent safepoints, the ultimate bound on the number of objectmonitors in play is just the number of live and dead objects in the heap. That is, there's no natural back-pressure on the number of monitors. If the number of monitors becomes too large the JVM can spend inordinate amounts of time in deflate_idle_monitors() scanning, trying to scavenge and deflate monitors, so there's both a time problem and space problem (arguably the space problem drives the time problem, but they manifest differently for customers).
There's a day-one assumption in various pieces of code that the object:monitor relationship is stable except over potential safepoints. That simplifies many operations but is the source of the issue you're seeing. Ultimately, it's probably better to deflate on the fly and decouple safepoints from deflation. (The old ExactVM implementation did just that). It's relatively straight-forward, but touches a good deal of code. If you're curious I've left more detailed comments in synchronizer.cpp.
In the mean time there are some palliative approaches under consideration to (a) induce STW safepoints to bound the number of extant monitors and in turn bound the cost of deflate_idle_monitors, and (b) partition the monitors so we only need scan those that were possibly inflated in the last STW epoch. We also have code percolating through the system that might cut down on some of the cases of inflation, which is why a test case would be of interest -- to see if the specific case you're running into might already be addressed.
I vaguely recall adding a perfdata counter that reflects the number of objectmonitors in the system. Where there any other rt._sync_* variables visible in perfdata?
> <writer thread='10'/>
> Revoking bias with potentially per-thread safepoint:
> Revoking bias of object 0x6df887d0 , mark 0x08ee3805 , type
> org.jacorb.poa.RequestQueue , prototype header 0x00000005 , allow rebias 0 ,
> requesting thread 0x09065800
> Revoked bias of currently-unlocked object
> <writer thread='149'/>
> Revoking bias by walking my own stack:
> Revoking bias of object 0x6df887c8 , mark 0x09065805 , type java.lang.Object ,
> prototype header 0x00000005 , allow rebias 0 , requesting thread 0x09065800
> Revoked bias of currently-locked object
> Inflating object 0x6df887c8 , mark 0x0906042a , type java.lang.Object
> Also, disabling biased locking has no effect on the magnitude of safepointing,
> so I concluded that safepointing must then be caused by lock inflation/deflation
> (which seems to be supported by the stack above).
> Afaik a lock is being inflated when the leight-weight locking algorithm can't
> acquire the lock immediately ("contention"). It will then fall back to
> heavy-weight locking (OS mutex?) and notify (wake up) all waiting threads (?).
> We have about 40 lock inflations per contended lock attempt (see the perfdata
> statistics above), does that mean there were (on average) 40 threads waiting on
> that contended lock?
> Since we have the same amount of deflations as we have inflations, I would
> assume that each heavy-weight lock is shortly after being "contended" reverted
> to a leight-weight lock again (during a safepoint in deflate_idle_monitors() as
> in the stack above??). But why? Couldn't it stay as a heavy-weight lock, instead
> of being deflated, only to be inflated shortly after again?
> In the example below, lock 0x96568a60 is being inflated, deflated, inflated
> again and then deflated again:
> <writer thread='143'/>
> Inflating object 0x96568a60 , mark 0x08677ffe , type org.jacorb.poa.RequestProcessor
> <writer thread='10'/>
> Deflating object 0x96568a60 , mark 0x08677ffe , type org.jacorb.poa.RequestProcessor
> <writer thread='143'/>
> Inflating object 0x96568a60 , mark 0x0918f4fe , type org.jacorb.poa.RequestProcessor
> <writer thread='240'/>
> Inflating object 0x79bceee8 , mark 0x08c0b662 , type java.lang.Object
> <writer thread='243'/>
> Inflating object 0x79bceeb0 , mark 0x0914e9ee , type org.jacorb.orb.ReplyReceiver
> <writer thread='10'/>
> Deflating object 0x79bceee8 , mark 0x08c0b662 , type java.lang.Object
> Deflating object 0x79bceeb0 , mark 0x0914e9ee , type org.jacorb.orb.ReplyReceiver
> Deflating object 0x96568a60 , mark 0x0918f4fe , type org.jacorb.poa.RequestProcessor
> I was just curious whether this is the intended behavior in case of lock
> contention, or whether we're seing some pathological strange thing happen here.
> Btw, I've also ran the workload with Hotspot 19.0-b03 from the latest JDK 7
> build with the same results.
> Any comments appreciated!
More information about the hotspot-runtime-dev