CMS Sudden long pauses with large number of weak references

Fri Jan 20 10:48:32 PST 2012

I didnt know this was in a forthcoming release so I had already compiled 
JDK7 from tip with the change (since its so small and simple)

I still saw some pauses, although I think that might be down to CMS 
starting to late, so I have lowered the Occupancy Fraction and started 
testing.

So far with the fix for 7112034 
<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034>, 
ParallelRefProcessing and an Occupancy Fraction of 70 I am seeing fairly 
good results with the worst pause time so far of ~200 ms (this is far 
better than multi-second pauses I had before)

I will let a few more tests run and then try jdk7u4 and let you know if 
this helps me

thank you for confirming my thoughts on this.

-- Greg
On 19/01/12 19:40, Krystal Mok wrote:
> Hi Greg,
>
> This might just be another case of "7112034: Parallel CMS fails to 
> properly mark reference objects" [1], which is only fixed recently, 
> and isn't delivered in any FCS releases yet.
>
> Could you try a recent JDK7u4 preview and see if the problem goes 
> away? Or, try -XX:-CMSConcurrentMTEnabled, which should workaround 
> this particular bug, but you'll get longer CMS collection cycles.
>
> - Kris
>
> [1]: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034
> [2]: http://jdk7.java.net/download.html
>
> On Fri, Jan 20, 2012 at 5:37 AM, Greg Bowyer <gbowyer at fastmail.co.uk 
> <mailto:gbowyer at fastmail.co.uk>> wrote:
>
>     Hi all
>
>     I have an application that is solr/lucene running on JDK 1.6 / 1.7
>     with
>     CMS, 9Gb heap. The only unusual options here are DisableExplictGC and
>     UseCompressedOops (which I do not think are the issue).
>
>     Generally the application has a heap usage of ~4G with a 2-3G float of
>     continual garbage on top.
>
>     The application creates several large heap objects, some of these are
>     sizeable arrays (~300mb), some of these are objects that have large
>     retained graphs.
>
>     One of these is a WeakHashMap that in lucene stores IndexReaders
>     (as the
>     key) -> "Fields", this mapping is used to avoid disk access, there are
>     not typically a large number of readers open at any time (at worst
>     this
>     could be say 25).
>
>     Generally we see the GC behavior being fairly solid across JVMS,
>     however
>     we get ever increasing amounts of concurrent-mode-failues with a
>     marked
>     spike of weak references, this only seems to appear when new indexes
>     load on lucene, which is when the references are changed and when new
>     versions of these caches are created.
>
>     ---- %< ----
>     4747.399: [GC 4747.399: [ParNew4747.492: [SoftReference, 0 refs,
>     0.0000120 secs]4747.492: [WeakReference, 5 refs, 0.0000050
>     secs]4747.492: [FinalReference, 0 refs, 0.0000040 secs]4747.492:
>     [PhantomReference, 0 refs, 0.0000030 secs]4747.492: [JNI Weak
>     Reference,
>     0.0000020 secs]: 133242K->11776K(153344K), 0.0925850 secs]
>     7199972K->7187397K(9420160K), 0.0926840 secs] [Times: user=0.72
>     sys=0.00, real=0.09 secs]
>     Total time for which application threads were stopped: 0.0931920
>     seconds
>     4747.530: [GC [1 CMS-initial-mark: 7175620K(9266816K)]
>     7300153K(9420160K), 0.0044900 secs] [Times: user=0.00 sys=0.00,
>     real=0.00 secs]
>     Total time for which application threads were stopped: 0.0398160
>     seconds
>     4747.535: [CMS-concurrent-mark-start]
>     4747.537: [GC 4747.537: [ParNew4747.568: [SoftReference, 0 refs,
>     0.0000120 secs]4747.568: [WeakReference, 6 refs, 0.0000060
>     secs]4747.568: [FinalReference, 0 refs, 0.0000040 secs]4747.568:
>     [PhantomReference, 0 refs, 0.0000040 secs]4747.568: [JNI Weak
>     Reference,
>     0.0000020 secs] (promotion failed): 127253K->127965K(153344K),
>     0.0315280
>     secs]4747.569: [CMS4755.725: [CMS-concurrent-mark: 8.157/8.190 secs]
>     [Times: user=10.81 sys=3.49, real=8.19 secs]
>      (concurrent mode failure)4757.436: [SoftReference, 0 refs, 0.0000050
>     secs]4757.436: [WeakReference, 6360 refs, 0.0016500 secs]4757.437:
>     [FinalReference, 462 refs, 0.0088580 secs]4757.446: [PhantomReference,
>     114 refs, 0.0000220 secs]4757.446: [JNI Weak Reference, 0.0000070
>     secs]:
>     7180161K->3603572K(9266816K), 17.1070460 secs]
>     7302874K->3603572K(9420160K), [CMS Perm : 47059K->47044K(78380K)],
>     17.1387610 secs] [Times: user=19.75 sys=3.49, real=17.14 secs]
>     Total time for which application threads were stopped: 17.1393510
>     seconds
>     Total time for which application threads were stopped: 0.0245300
>     seconds
>     ---- >% ----
>
>     ? Is this due to the defects fixed here
>     http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/f1391adc6681,
>     would -XX:+ParallelRefProcEnabled help here ?
>
>     Also is there anything that could explain the large number of suddenly
>     processed and traced weak-references
>
>     Many thanks
>     -- Greg
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120120/283a2161/attachment-0001.html