Very frequent CMS without obvious reason

Thu Nov 8 02:47:54 UTC 2012

I reported the same problem - and it happened again on this particular system - always after a week of normal operation. 

The time critical processes have meanwhile be migrated away so we did not need to restart the VM unlike before: What I can observe is that once the erratic behaviour starts it will stay in that mode for days.

PermGen and CodeCache have been sometimes close to the trigger threshold, but in other situations below. My final hope is to go with the InitiatingOccupancyOnly (but it does take some time to get that into production).

Bernd

-- 
bernd.eckenfels.net

Am 08.11.2012 um 02:36 schrieb Rainer Jung <rainer.jung at kippdata.de>:

> Replying to self: Reading some similar older threads I found a seemingly unrelated hint about a full code cache. Indeed I can find the messages about the full code cache. But:
> 
> - the frequent CMS runs happen before the code cache gets filled
> 
> - even if the bigger part of the performance problems might be coming from the full code cache I'd still be interested to understand, why the CMS runs so frequently.
> 
> Thanks!
> 
> Rainer
> 
> On 08.11.2012 01:54, Rainer Jung wrote:
>> We have an application running that after some time starts to do very
>> frequent CMS without any reason I can see. Here's a snippet from a GC
>> log where soon after starting the CMS a young generation collection
>> happens, so that we can easily see the amount of memory currently in use:
>> 
>> 06.11.2012 10:48:31 65102.013: [GC [1 CMS-initial-mark:
>> 1461672K(2469888K)] 2026034K(3125248K), 0.6163870 secs] [Times:
>> user=0.62 sys=0.00, real=0.62 secs]
>> 06.11.2012 10:48:31 Total time for which application threads were
>> stopped: 0.6173730 seconds
>> 06.11.2012 10:48:31 65102.630: [CMS-concurrent-mark-start]
>> 06.11.2012 10:48:31 Application time: 0.0012740 seconds
>> 06.11.2012 10:48:31 {Heap before GC invocations=912 (full 39):
>> 06.11.2012 10:48:31  par new generation   total 655360K, used 564361K
>> [0x0000000707c00000, 0x0000000737c00000, 0x0000000737c00000)
>> 06.11.2012 10:48:31   eden space 524288K, 100% used [0x0000000707c00000,
>> 0x0000000727c00000, 0x0000000727c00000)
>> 06.11.2012 10:48:31   from space 131072K,  30% used [0x000000072fc00000,
>> 0x0000000732322670, 0x0000000737c00000)
>> 06.11.2012 10:48:31   to   space 131072K,   0% used [0x0000000727c00000,
>> 0x0000000727c00000, 0x000000072fc00000)
>> 06.11.2012 10:48:31  concurrent mark-sweep generation total 2469888K,
>> used 1461672K [0x0000000737c00000, 0x00000007ce800000, 0x00000007ce800000)
>> 06.11.2012 10:48:31  concurrent-mark-sweep perm gen total 811008K, used
>> 746784K [0x00000007ce800000, 0x0000000800000000, 0x0000000800000000)
>> 06.11.2012 10:48:31 65102.633: [GC 65102.633: [ParNew
>> 06.11.2012 10:48:31 Desired survivor size 67108864 bytes, new threshold
>> 15 (max 15)
>> 06.11.2012 10:48:31 - age   1:    5295680 bytes,    5295680 total
>> 06.11.2012 10:48:31 - age   2:    2371208 bytes,    7666888 total
>> 06.11.2012 10:48:31 - age   3:     798848 bytes,    8465736 total
>> 06.11.2012 10:48:31 - age   4:     989720 bytes,    9455456 total
>> 06.11.2012 10:48:31 - age   5:    1837168 bytes,   11292624 total
>> 06.11.2012 10:48:31 - age   6:     447072 bytes,   11739696 total
>> 06.11.2012 10:48:31 - age   7:     868304 bytes,   12608000 total
>> 06.11.2012 10:48:31 - age   8:    1673520 bytes,   14281520 total
>> 06.11.2012 10:48:31 - age   9:    3109384 bytes,   17390904 total
>> 06.11.2012 10:48:31 - age  10:    1724424 bytes,   19115328 total
>> 06.11.2012 10:48:31 - age  11:    1206352 bytes,   20321680 total
>> 06.11.2012 10:48:31 - age  12:     908944 bytes,   21230624 total
>> 06.11.2012 10:48:31 - age  13:    1772568 bytes,   23003192 total
>> 06.11.2012 10:48:31 - age  14:    1310648 bytes,   24313840 total
>> 06.11.2012 10:48:31 - age  15:     867560 bytes,   25181400 total
>> 06.11.2012 10:48:31 : 564361K->30541K(655360K), 0.0292050 secs]
>> 2026034K->1493398K(3125248K), 0.0293500 secs] [Times: user=0.21
>> sys=0.00, real=0.03 secs]
>> 06.11.2012 10:48:31 Heap after GC invocations=913 (full 39):
>> 06.11.2012 10:48:31  par new generation   total 655360K, used 30541K
>> [0x0000000707c00000, 0x0000000737c00000, 0x0000000737c00000)
>> 06.11.2012 10:48:31   eden space 524288K,   0% used [0x0000000707c00000,
>> 0x0000000707c00000, 0x0000000727c00000)
>> 06.11.2012 10:48:31   from space 131072K,  23% used [0x0000000727c00000,
>> 0x00000007299d35f0, 0x000000072fc00000)
>> 06.11.2012 10:48:31   to   space 131072K,   0% used [0x000000072fc00000,
>> 0x000000072fc00000, 0x0000000737c00000)
>> 06.11.2012 10:48:31  concurrent mark-sweep generation total 2469888K,
>> used 1462856K [0x0000000737c00000, 0x00000007ce800000, 0x00000007ce800000)
>> 06.11.2012 10:48:31  concurrent-mark-sweep perm gen total 811008K, used
>> 746784K [0x00000007ce800000, 0x0000000800000000, 0x0000000800000000)
>> 06.11.2012 10:48:31 }
>> 
>> Since neither tenured nor perm are nearly full I don't see a reason for
>> this CMS. After some time CMS happens very frequently, multiple times
>> per minute until finally it runs more or less continuously and the JVM
>> needs to be restarted.
>> 
>> JVM flags are:
>> 
>> argv[0]: /opt/java/jdk1.6.0_35/bin/java
>> ... application system property ...
>> argv[2]: -server
>> argv[3]: -Dsun.rmi.dgc.client.gcInterval=28800000
>> ... application system property ...
>> argv[5]: -XX:+UseParNewGC
>> argv[6]: -XX:ParallelGCThreads=8
>> argv[7]: -XX:+UseConcMarkSweepGC
>> argv[8]: -XX:SurvivorRatio=4
>> argv[9]: -XX:+CMSClassUnloadingEnabled
>> argv[10]: -XX:CMSInitiatingOccupancyFraction=80
>> argv[11]: -XX:+UseCMSInitiatingOccupancyOnly
>> argv[12]: -Dsun.reflect.noInflation=true
>> argv[13]: -XX:+TraceClassLoading
>> argv[14]: -verbose:gc
>> argv[15]: -XX:+PrintGCApplicationConcurrentTime
>> argv[16]: -XX:+PrintGCApplicationStoppedTime
>> argv[17]: -XX:+PrintGCTimeStamps
>> argv[18]: -XX:+PrintGCDetails
>> argv[19]: -XX:+PrintHeapAtGC
>> argv[20]: -XX:+PrintTenuringDistribution
>> argv[21]: -Xms3052m
>> argv[22]: -Xmx3052m
>> argv[23]: -XX:PermSize=792m
>> argv[24]: -XX:MaxPermSize=792m
>> argv[25]: -XX:NewSize=640m
>> argv[26]: -XX:MaxNewSize=640m
>> ... application system properties ...
>> argv[38]: -Dcom.sun.management.jmxremote
>> argv[39]: -Dcom.sun.management.jmxremote.port=1090
>> argv[40]: -Dcom.sun.management.jmxremote.ssl=false
>> argv[41]: -Dcom.sun.management.jmxremote.authenticate=true
>> argv[42]:
>> -Dcom.sun.management.jmxremote.password.file=/.../jmxremote.password
>> argv[43]: -Dcom.sun.management.jmxremote.access.file=/.../jmxremote.access
>> argv[44]: -XX:-UseBiasedLocking
>> argv[45]: -XX:-EliminateLocks
>> argv[46]: -XX:-DoEscapeAnalysis
>> 
>> - There are no messages about promotion failures or concurrent mode
>> failures in the logs
>> 
>> - System.gc() would produce different log messages
>> 
>> - DGC would not run in intervals that get shorter and shorter down to
>> seconds
>> 
>> - perm is far from being full
>> 
>> - tenured is well below the Occupancy Fraction (always below 60%)
>> 
>> How can we find out the CMS cause here?
>> 
>> Note: args 44-46 are there because of historical reasons (load test
>> results). Even if for current JVM releases it might be better to remove
>> them, I don't see how they would explain frequent CMS runs.
>> 
>> Thanks for any hint!
>> 
>> Regards,
>> 
>> Rainer