monitors explosion
Y. Srinivas Ramakrishna
y.s.ramakrishna at oracle.com
Wed May 25 14:01:24 PDT 2011
Hi Peter --
Which version of the JDK elicits this behaviour? The monitor leakage
sounds vaguely like a problem that i encountered before and which
was fixed a while back, but this may well be a new problem,
and i can't dig up the CR for the problem that I recall in
this code, or where the fix went to. (Karen might recall
the details because she pushed the fix/workaround for that one.)
-- ramki
On 5/25/2011 7:21 AM, Peter Levart wrote:
> Hello hotspot developers,
>
> I'm posting here because I think this might interest you.
>
> Here's a summary:
>
> Our application has a boot-up sequence that spawns about 80 threads at the beginning where each of them executes a long running query into the database that preloads caches. We are using Oracle Coherence for that. At some point in this boot-up sequence where all 80 threads are busy inserting data into cache (each thread inserts into it's own cache instance - basically a ConcurrentHashMap with features that allow replication to oher nodes, etc...), the following happens to "deflate idle monitors" time (-XX:+TraceSafepointCleanupTime):
>
> [deflating idle monitors, 0.0004058 secs]
> [deflating idle monitors, 0.0003921 secs]
> [deflating idle monitors, 0.0005102 secs]
> [deflating idle monitors, 0.0011647 secs]
> [deflating idle monitors, 0.0004439 secs]
> [deflating idle monitors, 0.0003865 secs]
> [deflating idle monitors, 0.0003829 secs]
> [deflating idle monitors, 0.0003838 secs]
> [deflating idle monitors, 0.0003962 secs]
> [deflating idle monitors, 0.0003758 secs]
> [deflating idle monitors, 0.0003691 secs]
> [deflating idle monitors, 0.0030027 secs]
> [deflating idle monitors, 0.0032613 secs]
> [deflating idle monitors, 0.0095863 secs]
> [deflating idle monitors, 0.0116110 secs]
> [deflating idle monitors, 0.1404215 secs]
> [deflating idle monitors, 0.1260821 secs]
> [deflating idle monitors, 0.1520614 secs]
> [deflating idle monitors, 0.2608265 secs]
> [deflating idle monitors, 0.2195979 secs]
> [deflating idle monitors, 0.4246853 secs]
> [deflating idle monitors, 0.6519136 secs]
> [deflating idle monitors, 0.5262590 secs]
> [deflating idle monitors, 0.4404068 secs]
> [deflating idle monitors, 0.4404227 secs]
> [deflating idle monitors, 0.4404056 secs]
> [deflating idle monitors, 0.4403642 secs]
> [deflating idle monitors, 0.5276827 secs]
> [deflating idle monitors, 0.4405319 secs]
> [deflating idle monitors, 0.4406389 secs]
> [deflating idle monitors, 0.4403989 secs]
> [deflating idle monitors, 0.4403940 secs]
> [deflating idle monitors, 0.4404595 secs]
> [deflating idle monitors, 0.4403539 secs]
> ...
> ... and stabilizes at the end of boot-up sequence to:
> ...
> [deflating idle monitors, 0.7225519 secs]
> [deflating idle monitors, 0.7225144 secs]
> [deflating idle monitors, 0.7225794 secs]
> [deflating idle monitors, 0.7225868 secs]
> [deflating idle monitors, 0.7225747 secs]
> [deflating idle monitors, 0.7225810 secs]
> [deflating idle monitors, 0.7227106 secs]
> [deflating idle monitors, 0.7227789 secs]
> [deflating idle monitors, 0.7226465 secs]
> [deflating idle monitors, 0.7226106 secs]
>
> At the end of boot-up the system is idle and safepoints are not observed. If I then issue a HTTP request on the embeded Jetty container that executes some platform JNI operations (Sockets/NIO/File IO) it triggers 4 safepoints each taking 0.7 sec and request completes in roughly 3 secs.
>
> jstat -J-Djstat.showUnsupported=true -snap 27205 | grep sun.rt._sync_
>
> then prints the following:
>
> sun.rt._sync_ContendedLockAttempts=40303
> sun.rt._sync_Deflations=19882562
> sun.rt._sync_EmptyNotifications=0
> sun.rt._sync_FailedSpins=0
> sun.rt._sync_FutileWakeups=1912
> sun.rt._sync_Inflations=19859801
> sun.rt._sync_MonExtant=2166144
> sun.rt._sync_MonInCirculation=0
> sun.rt._sync_MonScavenged=0
> sun.rt._sync_Notifications=23310881
> sun.rt._sync_Parks=23333919
> sun.rt._sync_PrivateA=0
> sun.rt._sync_PrivateB=0
> sun.rt._sync_SlowEnter=0
> sun.rt._sync_SlowExit=0
> sun.rt._sync_SlowNotify=0
> sun.rt._sync_SlowNotifyAll=0
> sun.rt._sync_SuccessfulSpins=0
>
>
> ...the same boot-up sequence observed via the 2s samples of _sync_MonExtant variable:
>
>
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=640
> sun.rt._sync_MonExtant=768
> sun.rt._sync_MonExtant=768
> sun.rt._sync_MonExtant=768
> sun.rt._sync_MonExtant=768
> sun.rt._sync_MonExtant=3200
> sun.rt._sync_MonExtant=3200
> sun.rt._sync_MonExtant=3200
> sun.rt._sync_MonExtant=3200
> sun.rt._sync_MonExtant=3200
> sun.rt._sync_MonExtant=3200
> sun.rt._sync_MonExtant=6144
> sun.rt._sync_MonExtant=6400
> sun.rt._sync_MonExtant=19456
> sun.rt._sync_MonExtant=21632
> sun.rt._sync_MonExtant=21632
> sun.rt._sync_MonExtant=21632
> sun.rt._sync_MonExtant=21632
> sun.rt._sync_MonExtant=21632
> sun.rt._sync_MonExtant=21632
> sun.rt._sync_MonExtant=286080
> sun.rt._sync_MonExtant=286080
> sun.rt._sync_MonExtant=286080
> sun.rt._sync_MonExtant=286080
> sun.rt._sync_MonExtant=286080
> sun.rt._sync_MonExtant=286080
> sun.rt._sync_MonExtant=286080
> sun.rt._sync_MonExtant=286080
> sun.rt._sync_MonExtant=286080
> sun.rt._sync_MonExtant=286080
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=308608
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=546432
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=901760
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1188736
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1572480
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> sun.rt._sync_MonExtant=1770112
> ...
> ...
> sun.rt._sync_MonExtant=2166144
>
>
> It seems that somehow monitors accumulate on some thread's private free lists and those threads don't die so omFlush is never called for them (they live happily for ever after in a thread pool).
>
> We spawn those 80 inserting threads via FixedThreadPoolExecutor and after all tasks are finished we shutdown the executor, so those 80 threads are gone, but...
>
> ...is it possible for a thread to call omRelease for a monitor that was not omAlloc-ated by this same thread? In this case object monitors, allocated by one thread would get accumulated on a free list of another thread and if this thread is a long-running thread those monitors would stay there forever...
>
> Here's a test program that on my machine (i3 CPU, Linux) accumulates arround 1.6M monitors:
>
>
> package monitors;
>
> import java.util.concurrent.BlockingQueue;
> import java.util.concurrent.SynchronousQueue;
>
> /**
> * @author Peter Levart
> * @created 5/25/11 @ 3:35 PM
> */
> public class MonitorsExplosion
> {
> BlockingQueue<Object> queue = new SynchronousQueue<Object>();
>
> class Allocator extends Thread
> {
> final int allocations;
>
> Allocator(int allocations)
> {
> this.allocations = allocations;
> }
>
> @Override
> public void run()
> {
> for (int i = 0; i< allocations; i++)
> {
> Object o = new Object();
> synchronized (o)
> {
> try
> {
> queue.put(o);
> Thread.yield();
> }
> catch (InterruptedException e)
> {
> // ignore
> }
> }
> }
> }
> }
>
> class Collector extends Thread
> {
> @Override
> public void run()
> {
> int hashSum = 0;
>
> for (; ; )
> {
> try
> {
> Object o = queue.take();
> synchronized (o)
> {
> hashSum += o.hashCode();
> }
> }
> catch (InterruptedException e)
> {
> // ignore
> }
> }
> }
> }
>
> void detonate() throws InterruptedException
> {
> Allocator[] allocators = new Allocator[8];
> for (int i = 0; i< allocators.length; i++)
> {
> Allocator allocator = new Allocator(1000000);
> allocator.start();
> allocators[i] = allocator;
> }
>
> System.out.println("Allocators started");
>
> Collector collector = new Collector();
> collector.start();
>
> System.out.println("Collector started");
>
> for (Allocator allocator : allocators)
> allocator.join();
>
> System.out.println("Allocators finished");
>
> collector.join(); // never ending...
>
> System.out.println("Collector finished");
> }
>
> public static void main(String[] args) throws InterruptedException
> {
> new MonitorsExplosion().detonate();
> }
> }
>
>
> ...I don't know if they get returned to global free list or if they just lie there on Collectors private list though.
>
>
> Regards,
>
> Peter Levart
More information about the hotspot-runtime-dev
mailing list