Very frequent CMS without obvious reason

Thu Nov 8 00:54:41 UTC 2012

We have an application running that after some time starts to do very 
frequent CMS without any reason I can see. Here's a snippet from a GC 
log where soon after starting the CMS a young generation collection 
happens, so that we can easily see the amount of memory currently in use:

06.11.2012 10:48:31 65102.013: [GC [1 CMS-initial-mark: 
1461672K(2469888K)] 2026034K(3125248K), 0.6163870 secs] [Times: 
user=0.62 sys=0.00, real=0.62 secs]
06.11.2012 10:48:31 Total time for which application threads were 
stopped: 0.6173730 seconds
06.11.2012 10:48:31 65102.630: [CMS-concurrent-mark-start]
06.11.2012 10:48:31 Application time: 0.0012740 seconds
06.11.2012 10:48:31 {Heap before GC invocations=912 (full 39):
06.11.2012 10:48:31  par new generation   total 655360K, used 564361K 
[0x0000000707c00000, 0x0000000737c00000, 0x0000000737c00000)
06.11.2012 10:48:31   eden space 524288K, 100% used [0x0000000707c00000, 
0x0000000727c00000, 0x0000000727c00000)
06.11.2012 10:48:31   from space 131072K,  30% used [0x000000072fc00000, 
0x0000000732322670, 0x0000000737c00000)
06.11.2012 10:48:31   to   space 131072K,   0% used [0x0000000727c00000, 
0x0000000727c00000, 0x000000072fc00000)
06.11.2012 10:48:31  concurrent mark-sweep generation total 2469888K, 
used 1461672K [0x0000000737c00000, 0x00000007ce800000, 0x00000007ce800000)
06.11.2012 10:48:31  concurrent-mark-sweep perm gen total 811008K, used 
746784K [0x00000007ce800000, 0x0000000800000000, 0x0000000800000000)
06.11.2012 10:48:31 65102.633: [GC 65102.633: [ParNew
06.11.2012 10:48:31 Desired survivor size 67108864 bytes, new threshold 
15 (max 15)
06.11.2012 10:48:31 - age   1:    5295680 bytes,    5295680 total
06.11.2012 10:48:31 - age   2:    2371208 bytes,    7666888 total
06.11.2012 10:48:31 - age   3:     798848 bytes,    8465736 total
06.11.2012 10:48:31 - age   4:     989720 bytes,    9455456 total
06.11.2012 10:48:31 - age   5:    1837168 bytes,   11292624 total
06.11.2012 10:48:31 - age   6:     447072 bytes,   11739696 total
06.11.2012 10:48:31 - age   7:     868304 bytes,   12608000 total
06.11.2012 10:48:31 - age   8:    1673520 bytes,   14281520 total
06.11.2012 10:48:31 - age   9:    3109384 bytes,   17390904 total
06.11.2012 10:48:31 - age  10:    1724424 bytes,   19115328 total
06.11.2012 10:48:31 - age  11:    1206352 bytes,   20321680 total
06.11.2012 10:48:31 - age  12:     908944 bytes,   21230624 total
06.11.2012 10:48:31 - age  13:    1772568 bytes,   23003192 total
06.11.2012 10:48:31 - age  14:    1310648 bytes,   24313840 total
06.11.2012 10:48:31 - age  15:     867560 bytes,   25181400 total
06.11.2012 10:48:31 : 564361K->30541K(655360K), 0.0292050 secs] 
2026034K->1493398K(3125248K), 0.0293500 secs] [Times: user=0.21 
sys=0.00, real=0.03 secs]
06.11.2012 10:48:31 Heap after GC invocations=913 (full 39):
06.11.2012 10:48:31  par new generation   total 655360K, used 30541K 
[0x0000000707c00000, 0x0000000737c00000, 0x0000000737c00000)
06.11.2012 10:48:31   eden space 524288K,   0% used [0x0000000707c00000, 
0x0000000707c00000, 0x0000000727c00000)
06.11.2012 10:48:31   from space 131072K,  23% used [0x0000000727c00000, 
0x00000007299d35f0, 0x000000072fc00000)
06.11.2012 10:48:31   to   space 131072K,   0% used [0x000000072fc00000, 
0x000000072fc00000, 0x0000000737c00000)
06.11.2012 10:48:31  concurrent mark-sweep generation total 2469888K, 
used 1462856K [0x0000000737c00000, 0x00000007ce800000, 0x00000007ce800000)
06.11.2012 10:48:31  concurrent-mark-sweep perm gen total 811008K, used 
746784K [0x00000007ce800000, 0x0000000800000000, 0x0000000800000000)
06.11.2012 10:48:31 }

Since neither tenured nor perm are nearly full I don't see a reason for 
this CMS. After some time CMS happens very frequently, multiple times 
per minute until finally it runs more or less continuously and the JVM 
needs to be restarted.

JVM flags are:

argv[0]: /opt/java/jdk1.6.0_35/bin/java
... application system property ...
argv[2]: -server
argv[3]: -Dsun.rmi.dgc.client.gcInterval=28800000
... application system property ...
argv[5]: -XX:+UseParNewGC
argv[6]: -XX:ParallelGCThreads=8
argv[7]: -XX:+UseConcMarkSweepGC
argv[8]: -XX:SurvivorRatio=4
argv[9]: -XX:+CMSClassUnloadingEnabled
argv[10]: -XX:CMSInitiatingOccupancyFraction=80
argv[11]: -XX:+UseCMSInitiatingOccupancyOnly
argv[12]: -Dsun.reflect.noInflation=true
argv[13]: -XX:+TraceClassLoading
argv[14]: -verbose:gc
argv[15]: -XX:+PrintGCApplicationConcurrentTime
argv[16]: -XX:+PrintGCApplicationStoppedTime
argv[17]: -XX:+PrintGCTimeStamps
argv[18]: -XX:+PrintGCDetails
argv[19]: -XX:+PrintHeapAtGC
argv[20]: -XX:+PrintTenuringDistribution
argv[21]: -Xms3052m
argv[22]: -Xmx3052m
argv[23]: -XX:PermSize=792m
argv[24]: -XX:MaxPermSize=792m
argv[25]: -XX:NewSize=640m
argv[26]: -XX:MaxNewSize=640m
... application system properties ...
argv[38]: -Dcom.sun.management.jmxremote
argv[39]: -Dcom.sun.management.jmxremote.port=1090
argv[40]: -Dcom.sun.management.jmxremote.ssl=false
argv[41]: -Dcom.sun.management.jmxremote.authenticate=true
argv[42]: 
-Dcom.sun.management.jmxremote.password.file=/.../jmxremote.password
argv[43]: -Dcom.sun.management.jmxremote.access.file=/.../jmxremote.access
argv[44]: -XX:-UseBiasedLocking
argv[45]: -XX:-EliminateLocks
argv[46]: -XX:-DoEscapeAnalysis

- There are no messages about promotion failures or concurrent mode 
failures in the logs

- System.gc() would produce different log messages

- DGC would not run in intervals that get shorter and shorter down to 
seconds

- perm is far from being full

- tenured is well below the Occupancy Fraction (always below 60%)

How can we find out the CMS cause here?

Note: args 44-46 are there because of historical reasons (load test 
results). Even if for current JVM releases it might be better to remove 
them, I don't see how they would explain frequent CMS runs.

Thanks for any hint!

Regards,

Rainer