Very frequent CMS without obvious reason

Thu Nov 8 01:36:20 UTC 2012

Replying to self: Reading some similar older threads I found a seemingly 
unrelated hint about a full code cache. Indeed I can find the messages 
about the full code cache. But:

- the frequent CMS runs happen before the code cache gets filled

- even if the bigger part of the performance problems might be coming 
from the full code cache I'd still be interested to understand, why the 
CMS runs so frequently.

Thanks!

Rainer

On 08.11.2012 01:54, Rainer Jung wrote:
> We have an application running that after some time starts to do very
> frequent CMS without any reason I can see. Here's a snippet from a GC
> log where soon after starting the CMS a young generation collection
> happens, so that we can easily see the amount of memory currently in use:
>
> 06.11.2012 10:48:31 65102.013: [GC [1 CMS-initial-mark:
> 1461672K(2469888K)] 2026034K(3125248K), 0.6163870 secs] [Times:
> user=0.62 sys=0.00, real=0.62 secs]
> 06.11.2012 10:48:31 Total time for which application threads were
> stopped: 0.6173730 seconds
> 06.11.2012 10:48:31 65102.630: [CMS-concurrent-mark-start]
> 06.11.2012 10:48:31 Application time: 0.0012740 seconds
> 06.11.2012 10:48:31 {Heap before GC invocations=912 (full 39):
> 06.11.2012 10:48:31  par new generation   total 655360K, used 564361K
> [0x0000000707c00000, 0x0000000737c00000, 0x0000000737c00000)
> 06.11.2012 10:48:31   eden space 524288K, 100% used [0x0000000707c00000,
> 0x0000000727c00000, 0x0000000727c00000)
> 06.11.2012 10:48:31   from space 131072K,  30% used [0x000000072fc00000,
> 0x0000000732322670, 0x0000000737c00000)
> 06.11.2012 10:48:31   to   space 131072K,   0% used [0x0000000727c00000,
> 0x0000000727c00000, 0x000000072fc00000)
> 06.11.2012 10:48:31  concurrent mark-sweep generation total 2469888K,
> used 1461672K [0x0000000737c00000, 0x00000007ce800000, 0x00000007ce800000)
> 06.11.2012 10:48:31  concurrent-mark-sweep perm gen total 811008K, used
> 746784K [0x00000007ce800000, 0x0000000800000000, 0x0000000800000000)
> 06.11.2012 10:48:31 65102.633: [GC 65102.633: [ParNew
> 06.11.2012 10:48:31 Desired survivor size 67108864 bytes, new threshold
> 15 (max 15)
> 06.11.2012 10:48:31 - age   1:    5295680 bytes,    5295680 total
> 06.11.2012 10:48:31 - age   2:    2371208 bytes,    7666888 total
> 06.11.2012 10:48:31 - age   3:     798848 bytes,    8465736 total
> 06.11.2012 10:48:31 - age   4:     989720 bytes,    9455456 total
> 06.11.2012 10:48:31 - age   5:    1837168 bytes,   11292624 total
> 06.11.2012 10:48:31 - age   6:     447072 bytes,   11739696 total
> 06.11.2012 10:48:31 - age   7:     868304 bytes,   12608000 total
> 06.11.2012 10:48:31 - age   8:    1673520 bytes,   14281520 total
> 06.11.2012 10:48:31 - age   9:    3109384 bytes,   17390904 total
> 06.11.2012 10:48:31 - age  10:    1724424 bytes,   19115328 total
> 06.11.2012 10:48:31 - age  11:    1206352 bytes,   20321680 total
> 06.11.2012 10:48:31 - age  12:     908944 bytes,   21230624 total
> 06.11.2012 10:48:31 - age  13:    1772568 bytes,   23003192 total
> 06.11.2012 10:48:31 - age  14:    1310648 bytes,   24313840 total
> 06.11.2012 10:48:31 - age  15:     867560 bytes,   25181400 total
> 06.11.2012 10:48:31 : 564361K->30541K(655360K), 0.0292050 secs]
> 2026034K->1493398K(3125248K), 0.0293500 secs] [Times: user=0.21
> sys=0.00, real=0.03 secs]
> 06.11.2012 10:48:31 Heap after GC invocations=913 (full 39):
> 06.11.2012 10:48:31  par new generation   total 655360K, used 30541K
> [0x0000000707c00000, 0x0000000737c00000, 0x0000000737c00000)
> 06.11.2012 10:48:31   eden space 524288K,   0% used [0x0000000707c00000,
> 0x0000000707c00000, 0x0000000727c00000)
> 06.11.2012 10:48:31   from space 131072K,  23% used [0x0000000727c00000,
> 0x00000007299d35f0, 0x000000072fc00000)
> 06.11.2012 10:48:31   to   space 131072K,   0% used [0x000000072fc00000,
> 0x000000072fc00000, 0x0000000737c00000)
> 06.11.2012 10:48:31  concurrent mark-sweep generation total 2469888K,
> used 1462856K [0x0000000737c00000, 0x00000007ce800000, 0x00000007ce800000)
> 06.11.2012 10:48:31  concurrent-mark-sweep perm gen total 811008K, used
> 746784K [0x00000007ce800000, 0x0000000800000000, 0x0000000800000000)
> 06.11.2012 10:48:31 }
>
> Since neither tenured nor perm are nearly full I don't see a reason for
> this CMS. After some time CMS happens very frequently, multiple times
> per minute until finally it runs more or less continuously and the JVM
> needs to be restarted.
>
> JVM flags are:
>
> argv[0]: /opt/java/jdk1.6.0_35/bin/java
> ... application system property ...
> argv[2]: -server
> argv[3]: -Dsun.rmi.dgc.client.gcInterval=28800000
> ... application system property ...
> argv[5]: -XX:+UseParNewGC
> argv[6]: -XX:ParallelGCThreads=8
> argv[7]: -XX:+UseConcMarkSweepGC
> argv[8]: -XX:SurvivorRatio=4
> argv[9]: -XX:+CMSClassUnloadingEnabled
> argv[10]: -XX:CMSInitiatingOccupancyFraction=80
> argv[11]: -XX:+UseCMSInitiatingOccupancyOnly
> argv[12]: -Dsun.reflect.noInflation=true
> argv[13]: -XX:+TraceClassLoading
> argv[14]: -verbose:gc
> argv[15]: -XX:+PrintGCApplicationConcurrentTime
> argv[16]: -XX:+PrintGCApplicationStoppedTime
> argv[17]: -XX:+PrintGCTimeStamps
> argv[18]: -XX:+PrintGCDetails
> argv[19]: -XX:+PrintHeapAtGC
> argv[20]: -XX:+PrintTenuringDistribution
> argv[21]: -Xms3052m
> argv[22]: -Xmx3052m
> argv[23]: -XX:PermSize=792m
> argv[24]: -XX:MaxPermSize=792m
> argv[25]: -XX:NewSize=640m
> argv[26]: -XX:MaxNewSize=640m
> ... application system properties ...
> argv[38]: -Dcom.sun.management.jmxremote
> argv[39]: -Dcom.sun.management.jmxremote.port=1090
> argv[40]: -Dcom.sun.management.jmxremote.ssl=false
> argv[41]: -Dcom.sun.management.jmxremote.authenticate=true
> argv[42]:
> -Dcom.sun.management.jmxremote.password.file=/.../jmxremote.password
> argv[43]: -Dcom.sun.management.jmxremote.access.file=/.../jmxremote.access
> argv[44]: -XX:-UseBiasedLocking
> argv[45]: -XX:-EliminateLocks
> argv[46]: -XX:-DoEscapeAnalysis
>
> - There are no messages about promotion failures or concurrent mode
> failures in the logs
>
> - System.gc() would produce different log messages
>
> - DGC would not run in intervals that get shorter and shorter down to
> seconds
>
> - perm is far from being full
>
> - tenured is well below the Occupancy Fraction (always below 60%)
>
> How can we find out the CMS cause here?
>
> Note: args 44-46 are there because of historical reasons (load test
> results). Even if for current JVM releases it might be better to remove
> them, I don't see how they would explain frequent CMS runs.
>
> Thanks for any hint!
>
> Regards,
>
> Rainer