AW: Unexplanable events in GC logs

Thu Jan 23 03:58:48 PST 2014

Hi Denny,

Another reason for a Full GC that you did not list is: running jmap 
against the process.

If you run a histo:live or take a live object heap dump, a Full GC will 
happen (and it doesn't show as a System GC).
Is it possible that you are taking a heap histogram or heap dump using 
the live option?

-Matt

On 1/23/14, 3:38 AM, Denny Kettwig wrote:
>
> Hey folks,
>
> in one of our recent cluster systems we found 2 unexplainable events 
> within the GC logs. I'd like to address these events to you and ask 
> for your advice. We are running a clustered jBoss System with 4 nodes, 
> every node has the same configuration. We use the following parameters:
>
> -Xms10g
>
> -Xmx10g
>
> -Xmn3g
>
> -Xss2048k
>
> -XX:+ExplicitGCInvokesConcurrent
>
> -XX:+CMSClassUnloadingEnabled
>
> -XX:+UseParNewGC
>
> -XX:+UseConcMarkSweepGC
>
> -XX:ParallelGCThreads=22
>
> -XX:SurvivorRatio=8
>
> -XX:TargetSurvivorRatio=90
>
> -XX:PermSize=512M
>
> -XX:MaxPermSize=512m
>
> -Dsun.rmi.dgc.client.gcInterval=3600000
>
> -Dsun.rmi.dgc.server.gcInterval=3600000
>
> *Event 1*
>
> The first event is a very frequent CMS collections on the first node 
> for about 11 hours. We are talking here about a peak value of 258 CMS 
> collections per hour. From my knowledge this event starts without any 
> reason and ends without any reason since the old space is below 10% 
> usage in this time frame. I do not know what might have caused this 
> CMS collection. We already experienced similar events on this cluster 
> in the past, but at a much higher heap usage (above 90%) and under 
> high load with most likely a huge heap fragmentation and the frequent 
> CMS collections ended with a single very long Full GC that 
> defragmented the heap. In the current event all this is not the case. 
> *I attached the relevant part of the GC log*.
>
> *Event 2*
>
> The second event is just as confusing for me as the first one. At a 
> certain point in time a full GC takes place on *all 4 Nodes *and I 
> cannot find a reason for it. Here are the relevant parts:
>
> N1
>
> 2013-12-20T05:43:28.041+0100: 768231.382: [GC 768231.383: [ParNew: 
> 2618493K->92197K(2831168K), 0.3007160 secs] 
> 3120456K->594160K(10171200K), 0.3017204 secs] [Times: user=0.91 
> sys=0.00, real=0.30 secs]
>
> 2013-12-20T05:45:31.864+0100: 768355.209: [Full GC 768355.210: [CMS: 
> 501963K->496288K(7340032K), 4.2140085 secs] 
> 1397267K->496288K(10171200K), [CMS Perm : 203781K->178528K(524288K)], 
> 4.2148018 secs] [Times: user=4.21 sys=0.00, real=4.21 secs]
>
> 2013-12-20T05:52:40.591+0100: 768783.949: [GC 768783.949: [ParNew: 
> 2516608K->48243K(2831168K), 0.2649174 secs] 
> 3012896K->544532K(10171200K), 0.2659039 secs] [Times: user=0.47 
> sys=0.00, real=0.27 secs]
>
> N2
>
> 2013-12-20T04:57:21.524+0100: 765208.310: [GC 765208.311: [ParNew: 
> 924566K->111068K(2831168K), 0.1790573 secs] 
> 1416514K->603015K(10171200K), 0.1797121 secs] [Times: user=1.08 
> sys=0.00, real=0.19 secs]
>
> 2013-12-20T04:57:21.711+0100: 765208.499: [GC [1 CMS-initial-mark: 
> 491947K(7340032K)] 603015K(10171200K), 0.3289639 secs] [Times: 
> user=0.33 sys=0.00, real=0.33 secs]
>
> 2013-12-20T04:57:22.039+0100: 765208.828: [CMS-concurrent-mark-start]
>
> 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-mark: 
> 0.577/0.577 secs] [Times: user=3.53 sys=0.05, real=0.58 secs]
>
> 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-preclean-start]
>
> 2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-preclean: 
> 0.024/0.024 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
>
> 2013-12-20T04:57:22.647+0100: 765209.430: 
> [CMS-concurrent-abortable-preclean-start]
>
> CMS: abort preclean due to time 2013-12-20T04:57:28.060+0100: 
> 765214.848: [CMS-concurrent-abortable-preclean: 4.208/5.418 secs] 
> [Times: user=4.10 sys=0.03, real=5.41 secs]
>
> 2013-12-20T04:57:28.076+0100: 765214.857: [GC[YG occupancy: 124872 K 
> (2831168 K)]765214.857: [Rescan (parallel) , 0.0580906 
> secs]765214.916: [weak refs processing, 0.0000912 secs]765214.916: 
> [class unloading, 0.0769742 secs]765214.993: [scrub symbol & string 
> tables, 0.0612689 secs] [1 CMS-remark: 491947K(7340032K)] 
> 616820K(10171200K), 0.2256506 secs] [Times: user=0.48 sys=0.00, 
> real=0.22 secs]
>
> 2013-12-20T04:57:28.294+0100: 765215.083: [CMS-concurrent-sweep-start]
>
> 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-sweep: 
> 0.750/0.750 secs] [Times: user=0.76 sys=0.00, real=0.75 secs]
>
> 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-reset-start]
>
> 2013-12-20T04:57:29.074+0100: 765215.856: [CMS-concurrent-reset: 
> 0.022/0.022 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
>
> 2013-12-20T05:47:54.607+0100: 768241.413: [Full GC 768241.414: [CMS: 
> 491947K->464864K(7340032K), 4.5053183 secs] 
> 1910412K->464864K(10171200K), [CMS Perm : 183511K->168997K(524288K)], 
> 4.5059088 secs] [Times: user=4.49 sys=0.02, real=4.51 secs]
>
> 2013-12-20T06:47:59.098+0100: 771845.954: [GC 771845.954: [ParNew: 
> 1507774K->58195K(2831168K), 0.2307012 secs] 
> 1972638K->523059K(10171200K), 0.2313931 secs] [Times: user=0.37 
> sys=0.00, real=0.23 secs]
>
> N3
>
> 2013-12-20T05:46:25.441+0100: 767981.526: [GC 767981.526: [ParNew: 
> 2695212K->166641K(2831168K), 0.3278475 secs] 
> 3268057K->739486K(10171200K), 0.3284853 secs] [Times: user=1.62 
> sys=0.00, real=0.33 secs]
>
> 2013-12-20T05:49:55.467+0100: 768191.578: [Full GC 768191.578: [CMS: 
> 572844K->457790K(7340032K), 3.7687762 secs] 
> 1216176K->457790K(10171200K), [CMS Perm : 181711K->169514K(524288K)], 
> 3.7692999 secs] [Times: user=3.76 sys=0.00, real=3.79 secs]
>
> 2013-12-20T06:49:59.249+0100: 771795.415: [GC 771795.415: [ParNew: 
> 1585077K->72146K(2831168K), 0.2632945 secs] 
> 2042868K->529936K(10171200K), 0.2639889 secs] [Times: user=0.41 
> sys=0.00, real=0.27 secs]
>
> N4
>
> 2013-12-20T05:48:21.551+0100: 767914.067: [GC 767914.068: [ParNew: 
> 2656327K->119432K(2831168K), 0.2603676 secs] 
> 3222693K->685799K(10171200K), 0.2609581 secs] [Times: user=1.14 
> sys=0.00, real=0.26 secs]
>
> 2013-12-20T05:49:03.939+0100: 767956.457: [Full GC 767956.457: [CMS: 
> 566366K->579681K(7340032K), 6.1324841 secs] 
> 3149011K->579681K(10171200K), [CMS Perm : 190240K->174791K(524288K)], 
> 6.1331389 secs] [Times: user=6.13 sys=0.00, real=6.13 secs]
>
> 2013-12-20T05:50:10.262+0100: 768022.762: [GC 768022.763: [ParNew: 
> 2516608K->83922K(2831168K), 0.2157015 secs] 
> 3096289K->663603K(10171200K), 0.2162262 secs] [Times: user=0.41 
> sys=0.00, real=0.22 secs]
>
> Between these Full GC are only a few minutes or as between N3 and N4 a 
> few seconds. I made some research on possible reasons for a Full GC 
> and this is the list I gathered so far:
>
> 1.Running out of old gen
>
> 2.Running out of perm gen
>
> 3.Calling System.gc() (indicated by System in the ouput)
>
> 4.Not having enough free space in Survivor Space to copy objects from 
> Eden (promotion failed)
>
> 5.Running out of old gen before a concurrent collection can free it 
> (Concurrent Mode Failure)
>
> 6.Having high fragmentation and not enough space for a larger object 
> in old gen
>
> However none of these 6 conditions are fulfilled by any of the above 
> shown full GC. So once again I'm lost and do not have an explanation 
> for this.
>
> If you need the full logs for further analysis please let me know.
>
> Kind Regards
>
> Denny
>
> Beschreibung: werum-hr
>
> cid:image002.png at 01CEF67E.8AEB4630
>
> Werum Software & Systems AG
>
> Wulf-Werum-Strasse 3 | 21337 Lueneburg | Germany
>
> Tel. +49(0)4131/8900-983 | Fax +49(0)4131/8900-20
>
> mailto:denny.kettwig at werum.de | http://www.werum.de <http://www.werum.de/>
>
> VAT No. DE 116 083 850 | RG Lueneburg HRB 2262
>
> Chairman of Supervisory Board: Johannes Zimmermann
>
> Executive Board: Hartmut Krome, Ruediger Schlierenkaemper, Hans-Peter 
> Subel
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/dcdc9453/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 1089 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/dcdc9453/attachment-0001.jpe 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6441 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/dcdc9453/attachment-0001.png