CMS Sudden long pauses with large number of weak references

Thu Jan 19 13:37:00 PST 2012

Hi all

I have an application that is solr/lucene running on JDK 1.6 / 1.7 with 
CMS, 9Gb heap. The only unusual options here are DisableExplictGC and 
UseCompressedOops (which I do not think are the issue).

Generally the application has a heap usage of ~4G with a 2-3G float of 
continual garbage on top.

The application creates several large heap objects, some of these are 
sizeable arrays (~300mb), some of these are objects that have large 
retained graphs.

One of these is a WeakHashMap that in lucene stores IndexReaders (as the 
key) -> "Fields", this mapping is used to avoid disk access, there are 
not typically a large number of readers open at any time (at worst this 
could be say 25).

Generally we see the GC behavior being fairly solid across JVMS, however 
we get ever increasing amounts of concurrent-mode-failues with a marked 
spike of weak references, this only seems to appear when new indexes 
load on lucene, which is when the references are changed and when new 
versions of these caches are created.

---- %< ----
4747.399: [GC 4747.399: [ParNew4747.492: [SoftReference, 0 refs, 
0.0000120 secs]4747.492: [WeakReference, 5 refs, 0.0000050 
secs]4747.492: [FinalReference, 0 refs, 0.0000040 secs]4747.492: 
[PhantomReference, 0 refs, 0.0000030 secs]4747.492: [JNI Weak Reference, 
0.0000020 secs]: 133242K->11776K(153344K), 0.0925850 secs] 
7199972K->7187397K(9420160K), 0.0926840 secs] [Times: user=0.72 
sys=0.00, real=0.09 secs]
Total time for which application threads were stopped: 0.0931920 seconds
4747.530: [GC [1 CMS-initial-mark: 7175620K(9266816K)] 
7300153K(9420160K), 0.0044900 secs] [Times: user=0.00 sys=0.00, 
real=0.00 secs]
Total time for which application threads were stopped: 0.0398160 seconds
4747.535: [CMS-concurrent-mark-start]
4747.537: [GC 4747.537: [ParNew4747.568: [SoftReference, 0 refs, 
0.0000120 secs]4747.568: [WeakReference, 6 refs, 0.0000060 
secs]4747.568: [FinalReference, 0 refs, 0.0000040 secs]4747.568: 
[PhantomReference, 0 refs, 0.0000040 secs]4747.568: [JNI Weak Reference, 
0.0000020 secs] (promotion failed): 127253K->127965K(153344K), 0.0315280 
secs]4747.569: [CMS4755.725: [CMS-concurrent-mark: 8.157/8.190 secs] 
[Times: user=10.81 sys=3.49, real=8.19 secs]
  (concurrent mode failure)4757.436: [SoftReference, 0 refs, 0.0000050 
secs]4757.436: [WeakReference, 6360 refs, 0.0016500 secs]4757.437: 
[FinalReference, 462 refs, 0.0088580 secs]4757.446: [PhantomReference, 
114 refs, 0.0000220 secs]4757.446: [JNI Weak Reference, 0.0000070 secs]: 
7180161K->3603572K(9266816K), 17.1070460 secs] 
7302874K->3603572K(9420160K), [CMS Perm : 47059K->47044K(78380K)], 
17.1387610 secs] [Times: user=19.75 sys=3.49, real=17.14 secs]
Total time for which application threads were stopped: 17.1393510 seconds
Total time for which application threads were stopped: 0.0245300 seconds
---- >% ----

? Is this due to the defects fixed here 
http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/f1391adc6681, 
would -XX:+ParallelRefProcEnabled help here ?

Also is there anything that could explain the large number of suddenly 
processed and traced weak-references

Many thanks
-- Greg