Very frequent CMS without obvious reason

Thu Nov 8 01:35:13 UTC 2012

In fact, it looks like your permanent generation *is* nearly out of space.  746784K/811008K is 92% full, which is over the default CMSTriggerPermRatio of 80% (in JDK-6).  What happens if you make the permanent generation larger?

			... peter

Rainer Jung wrote:
> We have an application running that after some time starts to do very frequent CMS without any reason I can see. Here's a snippet from a GC log where soon after starting the CMS a young generation collection happens, so that we can easily see the amount of memory currently in use:
> 
> 06.11.2012 10:48:31 65102.013: [GC [1 CMS-initial-mark: 1461672K(2469888K)] 2026034K(3125248K), 0.6163870 secs] [Times: user=0.62 sys=0.00, real=0.62 secs]
> 06.11.2012 10:48:31 Total time for which application threads were stopped: 0.6173730 seconds
> 06.11.2012 10:48:31 65102.630: [CMS-concurrent-mark-start]
> 06.11.2012 10:48:31 Application time: 0.0012740 seconds
> 06.11.2012 10:48:31 {Heap before GC invocations=912 (full 39):
> 06.11.2012 10:48:31  par new generation   total 655360K, used 564361K [0x0000000707c00000, 0x0000000737c00000, 0x0000000737c00000)
> 06.11.2012 10:48:31   eden space 524288K, 100% used [0x0000000707c00000, 0x0000000727c00000, 0x0000000727c00000)
> 06.11.2012 10:48:31   from space 131072K,  30% used [0x000000072fc00000, 0x0000000732322670, 0x0000000737c00000)
> 06.11.2012 10:48:31   to   space 131072K,   0% used [0x0000000727c00000, 0x0000000727c00000, 0x000000072fc00000)
> 06.11.2012 10:48:31  concurrent mark-sweep generation total 2469888K, used 1461672K [0x0000000737c00000, 0x00000007ce800000, 0x00000007ce800000)
> 06.11.2012 10:48:31  concurrent-mark-sweep perm gen total 811008K, used 746784K [0x00000007ce800000, 0x0000000800000000, 0x0000000800000000)
> 06.11.2012 10:48:31 65102.633: [GC 65102.633: [ParNew
> 06.11.2012 10:48:31 Desired survivor size 67108864 bytes, new threshold 15 (max 15)
> 06.11.2012 10:48:31 - age   1:    5295680 bytes,    5295680 total
> 06.11.2012 10:48:31 - age   2:    2371208 bytes,    7666888 total
> 06.11.2012 10:48:31 - age   3:     798848 bytes,    8465736 total
> 06.11.2012 10:48:31 - age   4:     989720 bytes,    9455456 total
> 06.11.2012 10:48:31 - age   5:    1837168 bytes,   11292624 total
> 06.11.2012 10:48:31 - age   6:     447072 bytes,   11739696 total
> 06.11.2012 10:48:31 - age   7:     868304 bytes,   12608000 total
> 06.11.2012 10:48:31 - age   8:    1673520 bytes,   14281520 total
> 06.11.2012 10:48:31 - age   9:    3109384 bytes,   17390904 total
> 06.11.2012 10:48:31 - age  10:    1724424 bytes,   19115328 total
> 06.11.2012 10:48:31 - age  11:    1206352 bytes,   20321680 total
> 06.11.2012 10:48:31 - age  12:     908944 bytes,   21230624 total
> 06.11.2012 10:48:31 - age  13:    1772568 bytes,   23003192 total
> 06.11.2012 10:48:31 - age  14:    1310648 bytes,   24313840 total
> 06.11.2012 10:48:31 - age  15:     867560 bytes,   25181400 total
> 06.11.2012 10:48:31 : 564361K->30541K(655360K), 0.0292050 secs] 2026034K->1493398K(3125248K), 0.0293500 secs] [Times: user=0.21 sys=0.00, real=0.03 secs]
> 06.11.2012 10:48:31 Heap after GC invocations=913 (full 39):
> 06.11.2012 10:48:31  par new generation   total 655360K, used 30541K [0x0000000707c00000, 0x0000000737c00000, 0x0000000737c00000)
> 06.11.2012 10:48:31   eden space 524288K,   0% used [0x0000000707c00000, 0x0000000707c00000, 0x0000000727c00000)
> 06.11.2012 10:48:31   from space 131072K,  23% used [0x0000000727c00000, 0x00000007299d35f0, 0x000000072fc00000)
> 06.11.2012 10:48:31   to   space 131072K,   0% used [0x000000072fc00000, 0x000000072fc00000, 0x0000000737c00000)
> 06.11.2012 10:48:31  concurrent mark-sweep generation total 2469888K, used 1462856K [0x0000000737c00000, 0x00000007ce800000, 0x00000007ce800000)
> 06.11.2012 10:48:31  concurrent-mark-sweep perm gen total 811008K, used 746784K [0x00000007ce800000, 0x0000000800000000, 0x0000000800000000)
> 06.11.2012 10:48:31 }
> 
> Since neither tenured nor perm are nearly full I don't see a reason for this CMS. After some time CMS happens very frequently, multiple times per minute until finally it runs more or less continuously and the JVM needs to be restarted.
> 
> JVM flags are:
> 
> argv[0]: /opt/java/jdk1.6.0_35/bin/java
> ... application system property ...
> argv[2]: -server
> argv[3]: -Dsun.rmi.dgc.client.gcInterval=28800000
> ... application system property ...
> argv[5]: -XX:+UseParNewGC
> argv[6]: -XX:ParallelGCThreads=8
> argv[7]: -XX:+UseConcMarkSweepGC
> argv[8]: -XX:SurvivorRatio=4
> argv[9]: -XX:+CMSClassUnloadingEnabled
> argv[10]: -XX:CMSInitiatingOccupancyFraction=80
> argv[11]: -XX:+UseCMSInitiatingOccupancyOnly
> argv[12]: -Dsun.reflect.noInflation=true
> argv[13]: -XX:+TraceClassLoading
> argv[14]: -verbose:gc
> argv[15]: -XX:+PrintGCApplicationConcurrentTime
> argv[16]: -XX:+PrintGCApplicationStoppedTime
> argv[17]: -XX:+PrintGCTimeStamps
> argv[18]: -XX:+PrintGCDetails
> argv[19]: -XX:+PrintHeapAtGC
> argv[20]: -XX:+PrintTenuringDistribution
> argv[21]: -Xms3052m
> argv[22]: -Xmx3052m
> argv[23]: -XX:PermSize=792m
> argv[24]: -XX:MaxPermSize=792m
> argv[25]: -XX:NewSize=640m
> argv[26]: -XX:MaxNewSize=640m
> ... application system properties ...
> argv[38]: -Dcom.sun.management.jmxremote
> argv[39]: -Dcom.sun.management.jmxremote.port=1090
> argv[40]: -Dcom.sun.management.jmxremote.ssl=false
> argv[41]: -Dcom.sun.management.jmxremote.authenticate=true
> argv[42]: -Dcom.sun.management.jmxremote.password.file=/.../jmxremote.password
> argv[43]: -Dcom.sun.management.jmxremote.access.file=/.../jmxremote.access
> argv[44]: -XX:-UseBiasedLocking
> argv[45]: -XX:-EliminateLocks
> argv[46]: -XX:-DoEscapeAnalysis
> 
> - There are no messages about promotion failures or concurrent mode failures in the logs
> 
> - System.gc() would produce different log messages
> 
> - DGC would not run in intervals that get shorter and shorter down to seconds
> 
> - perm is far from being full
> 
> - tenured is well below the Occupancy Fraction (always below 60%)
> 
> How can we find out the CMS cause here?
> 
> Note: args 44-46 are there because of historical reasons (load test results). Even if for current JVM releases it might be better to remove them, I don't see how they would explain frequent CMS runs.
> 
> Thanks for any hint!
> 
> Regards,
> 
> Rainer