monitors explosion
Peter Levart
peter.levart at marand.si
Wed May 25 07:21:07 PDT 2011
Hello hotspot developers,
I'm posting here because I think this might interest you.
Here's a summary:
Our application has a boot-up sequence that spawns about 80 threads at the beginning where each of them executes a long running query into the database that preloads caches. We are using Oracle Coherence for that. At some point in this boot-up sequence where all 80 threads are busy inserting data into cache (each thread inserts into it's own cache instance - basically a ConcurrentHashMap with features that allow replication to oher nodes, etc...), the following happens to "deflate idle monitors" time (-XX:+TraceSafepointCleanupTime):
[deflating idle monitors, 0.0004058 secs]
[deflating idle monitors, 0.0003921 secs]
[deflating idle monitors, 0.0005102 secs]
[deflating idle monitors, 0.0011647 secs]
[deflating idle monitors, 0.0004439 secs]
[deflating idle monitors, 0.0003865 secs]
[deflating idle monitors, 0.0003829 secs]
[deflating idle monitors, 0.0003838 secs]
[deflating idle monitors, 0.0003962 secs]
[deflating idle monitors, 0.0003758 secs]
[deflating idle monitors, 0.0003691 secs]
[deflating idle monitors, 0.0030027 secs]
[deflating idle monitors, 0.0032613 secs]
[deflating idle monitors, 0.0095863 secs]
[deflating idle monitors, 0.0116110 secs]
[deflating idle monitors, 0.1404215 secs]
[deflating idle monitors, 0.1260821 secs]
[deflating idle monitors, 0.1520614 secs]
[deflating idle monitors, 0.2608265 secs]
[deflating idle monitors, 0.2195979 secs]
[deflating idle monitors, 0.4246853 secs]
[deflating idle monitors, 0.6519136 secs]
[deflating idle monitors, 0.5262590 secs]
[deflating idle monitors, 0.4404068 secs]
[deflating idle monitors, 0.4404227 secs]
[deflating idle monitors, 0.4404056 secs]
[deflating idle monitors, 0.4403642 secs]
[deflating idle monitors, 0.5276827 secs]
[deflating idle monitors, 0.4405319 secs]
[deflating idle monitors, 0.4406389 secs]
[deflating idle monitors, 0.4403989 secs]
[deflating idle monitors, 0.4403940 secs]
[deflating idle monitors, 0.4404595 secs]
[deflating idle monitors, 0.4403539 secs]
...
... and stabilizes at the end of boot-up sequence to:
...
[deflating idle monitors, 0.7225519 secs]
[deflating idle monitors, 0.7225144 secs]
[deflating idle monitors, 0.7225794 secs]
[deflating idle monitors, 0.7225868 secs]
[deflating idle monitors, 0.7225747 secs]
[deflating idle monitors, 0.7225810 secs]
[deflating idle monitors, 0.7227106 secs]
[deflating idle monitors, 0.7227789 secs]
[deflating idle monitors, 0.7226465 secs]
[deflating idle monitors, 0.7226106 secs]
At the end of boot-up the system is idle and safepoints are not observed. If I then issue a HTTP request on the embeded Jetty container that executes some platform JNI operations (Sockets/NIO/File IO) it triggers 4 safepoints each taking 0.7 sec and request completes in roughly 3 secs.
jstat -J-Djstat.showUnsupported=true -snap 27205 | grep sun.rt._sync_
then prints the following:
sun.rt._sync_ContendedLockAttempts=40303
sun.rt._sync_Deflations=19882562
sun.rt._sync_EmptyNotifications=0
sun.rt._sync_FailedSpins=0
sun.rt._sync_FutileWakeups=1912
sun.rt._sync_Inflations=19859801
sun.rt._sync_MonExtant=2166144
sun.rt._sync_MonInCirculation=0
sun.rt._sync_MonScavenged=0
sun.rt._sync_Notifications=23310881
sun.rt._sync_Parks=23333919
sun.rt._sync_PrivateA=0
sun.rt._sync_PrivateB=0
sun.rt._sync_SlowEnter=0
sun.rt._sync_SlowExit=0
sun.rt._sync_SlowNotify=0
sun.rt._sync_SlowNotifyAll=0
sun.rt._sync_SuccessfulSpins=0
...the same boot-up sequence observed via the 2s samples of _sync_MonExtant variable:
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=640
sun.rt._sync_MonExtant=768
sun.rt._sync_MonExtant=768
sun.rt._sync_MonExtant=768
sun.rt._sync_MonExtant=768
sun.rt._sync_MonExtant=3200
sun.rt._sync_MonExtant=3200
sun.rt._sync_MonExtant=3200
sun.rt._sync_MonExtant=3200
sun.rt._sync_MonExtant=3200
sun.rt._sync_MonExtant=3200
sun.rt._sync_MonExtant=6144
sun.rt._sync_MonExtant=6400
sun.rt._sync_MonExtant=19456
sun.rt._sync_MonExtant=21632
sun.rt._sync_MonExtant=21632
sun.rt._sync_MonExtant=21632
sun.rt._sync_MonExtant=21632
sun.rt._sync_MonExtant=21632
sun.rt._sync_MonExtant=21632
sun.rt._sync_MonExtant=286080
sun.rt._sync_MonExtant=286080
sun.rt._sync_MonExtant=286080
sun.rt._sync_MonExtant=286080
sun.rt._sync_MonExtant=286080
sun.rt._sync_MonExtant=286080
sun.rt._sync_MonExtant=286080
sun.rt._sync_MonExtant=286080
sun.rt._sync_MonExtant=286080
sun.rt._sync_MonExtant=286080
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=308608
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=546432
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=901760
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1188736
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1572480
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
sun.rt._sync_MonExtant=1770112
...
...
sun.rt._sync_MonExtant=2166144
It seems that somehow monitors accumulate on some thread's private free lists and those threads don't die so omFlush is never called for them (they live happily for ever after in a thread pool).
We spawn those 80 inserting threads via FixedThreadPoolExecutor and after all tasks are finished we shutdown the executor, so those 80 threads are gone, but...
...is it possible for a thread to call omRelease for a monitor that was not omAlloc-ated by this same thread? In this case object monitors, allocated by one thread would get accumulated on a free list of another thread and if this thread is a long-running thread those monitors would stay there forever...
Here's a test program that on my machine (i3 CPU, Linux) accumulates arround 1.6M monitors:
package monitors;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.SynchronousQueue;
/**
* @author Peter Levart
* @created 5/25/11 @ 3:35 PM
*/
public class MonitorsExplosion
{
BlockingQueue<Object> queue = new SynchronousQueue<Object>();
class Allocator extends Thread
{
final int allocations;
Allocator(int allocations)
{
this.allocations = allocations;
}
@Override
public void run()
{
for (int i = 0; i < allocations; i++)
{
Object o = new Object();
synchronized (o)
{
try
{
queue.put(o);
Thread.yield();
}
catch (InterruptedException e)
{
// ignore
}
}
}
}
}
class Collector extends Thread
{
@Override
public void run()
{
int hashSum = 0;
for (; ; )
{
try
{
Object o = queue.take();
synchronized (o)
{
hashSum += o.hashCode();
}
}
catch (InterruptedException e)
{
// ignore
}
}
}
}
void detonate() throws InterruptedException
{
Allocator[] allocators = new Allocator[8];
for (int i = 0; i < allocators.length; i++)
{
Allocator allocator = new Allocator(1000000);
allocator.start();
allocators[i] = allocator;
}
System.out.println("Allocators started");
Collector collector = new Collector();
collector.start();
System.out.println("Collector started");
for (Allocator allocator : allocators)
allocator.join();
System.out.println("Allocators finished");
collector.join(); // never ending...
System.out.println("Collector finished");
}
public static void main(String[] args) throws InterruptedException
{
new MonitorsExplosion().detonate();
}
}
...I don't know if they get returned to global free list or if they just lie there on Collectors private list though.
Regards,
Peter Levart
More information about the hotspot-runtime-dev
mailing list