Continuous CMS Collections Followed By Concurrent Mode Failure

Tue Jul 8 07:06:22 UTC 2014

Hi Elliot

As Ramki so assutely suggested, you need to add -XX:+CMSClassUnloading.

Ramki, should we file a bug for the ParNew (promotion failed) being corrupted by a CMS cycle?

Regards,
Kirk

On Jul 8, 2014, at 8:18 AM, Elliot Barlas <Elliot.Barlas at citrix.com> wrote:

> JDK and JVM options:
> 
> $ /usr/java/jdk1.7.0_51/bin/java -version
> java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
> 
> -Xms2048m
> -Xmx2048m
> -XX:PermSize=256m
> -XX:MaxPermSize=256m
> -XX:+UseConcMarkSweepGC
> -XX:+PrintClassHistogram
> -XX:+DisableExplicitGC
> -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails
> -XX:+PrintTenuringDistribution
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=<logs dir>
> -Xloggc:<log path>
> 
> From: Kirk Pepperdine [kirk at kodewerk.com]
> Sent: Monday, July 07, 2014 11:13 PM
> To: Srinivas Ramakrishna
> Cc: Elliot Barlas; hotspot-gc-dev at openjdk.java.net openjdk.java.net
> Subject: Re: Continuous CMS Collections Followed By Concurrent Mode Failure
> 
> Hi Ramki and Elliot,
> 
> That was actually my guess. There is only 1 perm record in the file and although it shows that perm is grossly over-sized (ok, there is only one record ;-)) all of the recovery comes from the CMF which suggests perm is involved. All of the other CMS cycles are clearly due to tenured never being below the initiating occupancy fraction. Even without a young collection the initial marks are constantly reporting an occupancy of >1920xxxK of 1929xxxK. Oddly enough the ParNew’s only once promoted enough to trip the concurrent mode failure and oddly enough each of the CMS cycles (without the intervening ParNew) seem to recover about 1xxK bytes per cycle.
> 
> 2014-06-10T22:56:18.793-0700: 4999527.565: [GC [1 CMS-initial-mark: 1920286K(1926784K)] 2051254K(2080128K), 0.3388330 secs] [Times: user=0.33 sys=0.00, real=0.33 secs] 
> …...
> 2014-06-10T22:56:26.242-0700: 4999535.014: [GC2014-06-10T22:56:26.242-0700: 4999535.014: [ParNew2014-06-10T22:56:26.256-0700: 4999535.028: [CMS-concurrent-abortable-preclean: 1.948/3.114 secs] [Times: user=1.93 sys=0.12, real=3.11 secs] 
>  (promotion failed)
> Desired survivor size 8716288 bytes, new threshold 6 (max 6)
> - age   1:    1036320 bytes,    1036320 total
> - age   2:     825248 bytes,    1861568 total
> - age   3:     119024 bytes,    1980592 total
> - age   4:     113784 bytes,    2094376 total
> - age   5:     129024 bytes,    2223400 total
> - age   6:     154976 bytes,    2378376 total
> : 141769K->140729K(153344K), 0.3807730 secs]2014-06-10T22:56:26.623-0700: 4999535.395: [CMS
>  (concurrent mode failure): 1920816K->50773K(1926784K), 28.3938140 secs] 2062055K->50773K(2080128K), [CMS Perm : 48657K->41071K(262144K)], 28.7750370 secs] [Times: user=1.65 sys=0.03, real=28.78 secs] 
> 2014-06-10T22:58:41.361-0700: 4999670.133: [GC2014-06-10T22:58:41.361-0700: 4999670.133: [ParNew
> Desired survivor size 8716288 bytes, new threshold 6 (max 6)
> - age   1:    1239232 bytes,    1239232 total
> : 136320K->2239K(153344K), 0.0223680 secs] 187093K->53012K(2080128K), 0.0226340 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 
> 
> Note the corrupted formatting of the ParNew (promotion failed).
> 
> Elliot, can you post the jdk version you are using?
> 
> Regards,
> Kirk
> 
> 
> On Jul 8, 2014, at 2:37 AM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
> 
>> Haven't looked at yr log, but from yr description I suspect you need to enable class unloading. That's one thing STW GC does by default that CMS doesn't (at least until 7uXX) do by default.
>> 
>> -XX:+CMSClassUnoadingEnabled 
>> 
>> -- ramki
>> 
>> 
>> On Mon, Jul 7, 2014 at 10:39 AM, Elliot Barlas <Elliot.Barlas at citrix.com> wrote:
>> Hi all, I have a question about CMS collections and I'm hoping you can help.
>> 
>> The GC log for my Java application indicates continuous CMS GC followed by a concurrent mode failure stop-the-world collection that reclaims nearly the entire heap.
>> 
>> Why are the CMS collections failing to clear the old generation? Why is a concurrent mode failure stop-the-world collection required?
>> 
>> CMS collections like the one below occurred continuously for several days before a concurrent mode failure finally forced a stop-the-world collection that cleared space. Notice how the CMS collections recover almost no space, while the collection following the promotion failure reduces the old generation from 1.92 GB to 50.7 MB.
>> 
>> Is it due to objects kept alive by dead, uncollected objects in the permanent generation, which are only discarded during a STW collection? Should I consider using ‑XX:+CMSClassUnloadingEnabled to address this?
>> 
>> 
>> ----- Complete CMS collection in GC log -----
>> 
>> 2014-06-10T22:54:45.999-0700: 4999434.771: [GC [1 CMS-initial-mark: 1920327K(1926784K)] 2050302K(2080128K), 0.3369430 secs] [Times: user=0.34 sys=0.00, real=0.33 secs] 
>> 2014-06-10T22:54:46.338-0700: 4999435.111: [CMS-concurrent-mark-start]
>> 2014-06-10T22:54:50.543-0700: 4999439.315: [CMS-concurrent-mark: 4.204/4.204 secs] [Times: user=4.21 sys=0.08, real=4.20 secs] 
>> 2014-06-10T22:54:50.543-0700: 4999439.315: [CMS-concurrent-preclean-start]
>> 2014-06-10T22:54:50.573-0700: 4999439.345: [CMS-concurrent-preclean: 0.023/0.030 secs] [Times: user=0.02 sys=0.00, real=0.04 secs] 
>> 2014-06-10T22:54:50.573-0700: 4999439.346: [CMS-concurrent-abortable-preclean-start]
>> 2014-06-10T22:54:54.599-0700: 4999443.371: [GC2014-06-10T22:54:54.599-0700: 4999443.372:    [ParNew
>> Desired survivor size 8716288 bytes, new threshold 6 (max 6)
>> - age   1:    1410440 bytes,    1410440 total
>> - age   2:     181888 bytes,    1592328 total
>> - age   3:     117864 bytes,    1710192 total
>> - age   4:     136792 bytes,    1846984 total
>> - age   5:     161296 bytes,    2008280 total
>> - age   6:    2488416 bytes,    4496696 total
>> : 141989K->5449K(153344K), 0.1317090 secs] 2062317K->1925911K(2080128K), 0.1321970 secs]    [Times: user=0.23 sys=0.01, real=0.14 secs] 
>>  CMS: abort preclean due to time 2014-06-10T22:54:55.606-0700: 4999444.378: [CMS-concurrent-abortable-preclean: 2.600/5.033 secs] [Times: user=2.88 sys=0.08, real=5.03 secs] 
>> 2014-06-10T22:54:55.611-0700: 4999444.384: [GC[YG occupancy: 10356 K (153344 K)]2014-06-10T22:54:55.612-0700: 4999444.384: [Rescan (parallel) , 0.1665620 secs]2014-06-10T22:54:55.778-0700: 4
>> 999444.550: [weak refs processing, 0.0000440 secs]2014-06-10T22:54:55.778-0700: 4999444.551: [scrub string table, 0.0010220 secs] [1 CMS-remark: 1920462K(1926784K)] 1930818K(2080128K), 0.1678710 secs] [Times: user=0.28 sys=0.00, real=0.17 secs] 
>> 2014-06-10T22:54:55.780-0700: 4999444.552: [CMS-concurrent-sweep-start]
>> 2014-06-10T22:54:57.554-0700: 4999446.326: [CMS-concurrent-sweep: 1.775/1.775 secs]    [Times: user=1.82 sys=0.01, real=1.78 secs] 
>> 2014-06-10T22:54:57.554-0700: 4999446.327: [CMS-concurrent-reset-start]
>> 2014-06-10T22:54:57.564-0700: 4999446.336: [CMS-concurrent-reset: 0.009/0.009 secs] [Times: user=0.01 sys=0.01, real=0.01 secs]
>> 
>> 
>> ----- Concurrent mode failure GC log ------
>> 
>> 2014-06-10T22:56:18.793-0700: 4999527.565: [GC [1 CMS-initial-mark: 1920286K(1926784K)] 2051254K(2080128K), 0.3388330 secs] [Times: user=0.33 sys=0.00, real=0.33 secs] 
>> 2014-06-10T22:56:19.132-0700: 4999527.904: [CMS-concurrent-mark-start]
>> 2014-06-10T22:56:23.112-0700: 4999531.884: [CMS-concurrent-mark: 3.976/3.980 secs] [Times: user=4.07 sys=0.04, real=3.99 secs] 
>> 2014-06-10T22:56:23.112-0700: 4999531.885: [CMS-concurrent-preclean-start]
>> 2014-06-10T22:56:23.141-0700: 4999531.914: [CMS-concurrent-preclean: 0.022/0.029 secs] [Times: user=0.01 sys=0.01, real=0.03 secs] 
>> 2014-06-10T22:56:23.141-0700: 4999531.914: [CMS-concurrent-abortable-preclean-start]
>> 2014-06-10T22:56:26.242-0700: 4999535.014: [GC2014-06-10T22:56:26.242-0700: 4999535.014: [ParNew2014-06-10T22:56:26.256-0700: 4999535.028: [CMS-concurrent-abortable-preclean: 1.948/3.114 secs] [Times: user=1.93 sys=0.12, real=3.11 secs] 
>>  (promotion failed)
>> Desired survivor size 8716288 bytes, new threshold 6 (max 6)
>> - age   1:    1036320 bytes,    1036320 total
>> - age   2:     825248 bytes,    1861568 total
>> - age   3:     119024 bytes,    1980592 total
>> - age   4:     113784 bytes,    2094376 total
>> - age   5:     129024 bytes,    2223400 total
>> - age   6:     154976 bytes,    2378376 total
>> : 141769K->140729K(153344K), 0.3807730 secs]2014-06-10T22:56:26.623-0700: 4999535.395: [CMS
>>  (concurrent mode failure): 1920816K->50773K(1926784K), 28.3938140 secs] 2062055K->50773K(2080128K), [CMS Perm : 48657K->41071K(262144K)], 28.7750370 secs] [Times: user=1.65 sys=0.03, real=28.78 secs] 
>> 
>> 
>> ----- Background -----
>> 
>> $ /usr/java/jdk1.7.0_51/bin/java -version
>> 
>> java version "1.7.0_51"
>> 
>> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
>> 
>> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>> 
>> 
>> 
>> -Xms2048m
>> 
>> -Xmx2048m
>> 
>> -XX:PermSize=256m
>> 
>> -XX:MaxPermSize=256m
>> 
>> -XX:+UseConcMarkSweepGC
>> 
>> -XX:+PrintClassHistogram
>> 
>> -XX:+DisableExplicitGC
>> 
>> -XX:+PrintGCDateStamps
>> 
>> -XX:+PrintGCDetails
>> 
>> -XX:+PrintTenuringDistribution
>> 
>> -XX:+HeapDumpOnOutOfMemoryError
>> 
>> -XX:HeapDumpPath=<logs dir>
>> 
>> 
>> -Xloggc:<log path>
>> 
>> 
>> Thanks,
>> Elliot

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20140708/9229c8b9/attachment.htm>