Very frequent CMS without obvious reason

Thu Nov 8 05:49:31 UTC 2012

I think Peter is right and it's the perm gen exceeding its triggering
threshold:-

$ java -XX:+UseConcMarkSweepGC -XX:+PrintFlagsFinal -version | grep -e
"Trigger\|Initiating"
     intx CMSInitiatingOccupancyFraction            = -1
{product}
     intx CMSInitiatingPermOccupancyFraction        = -1
{product}
     intx CMSTriggerPermRatio                       = 80
{product}
     intx CMSTriggerRatio                           = 80
{product}
    uintx InitiatingHeapOccupancyPercent            = 45
{product}
     bool UseCMSInitiatingOccupancyOnly             = false
{product}
java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b05)
Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)

// The field "_initiating_occupancy" represents the occupancy percentage
// at which we trigger a new collection cycle.  Unless explicitly specified
// via CMSInitiatingOccupancyFraction (argument "io" below), it
// is calculated by:
//
//   Let "f" be MinHeapFreeRatio in
//
//    _intiating_occupancy = 100-f +
//                           f * (CMSTriggerRatio/100)
//   where CMSTriggerRatio is the argument "tr" below.
//
// That is, if we assume the heap is at its desired maximum occupancy at the
// end of a collection, we let CMSTriggerRatio of the (purported) free
// space be allocated before initiating a new collection cycle.
//
void ConcurrentMarkSweepGeneration::init_initiating_occupancy(intx io, intx
tr) {
  assert(io <= 100 && tr >= 0 && tr <= 100, "Check the arguments");
  if (io >= 0) {
    _initiating_occupancy = (double)io / 100.0;
  } else {
    _initiating_occupancy = ((100 - MinHeapFreeRatio) +
                             (double)(tr * MinHeapFreeRatio) / 100.0)
                            / 100.0;
  }
}

$ java -XX:+UseConcMarkSweepGC -XX:+PrintFlagsFinal -version | grep -e
"MinHeapFree"
    uintx MinHeapFreeRatio                          = 40
{product}
java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b05)
Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)

So, the initating occupancy for the perm gen is probably right around ((100
- 40) + (80*40)/100)/100 = 92%.

Perhaps your logs will tell you if that's when (i.e. at 92% perm occupancy)
that the back-to-back collections started, and you might thence
look for perhaps a perm gen leak in your application (loader) or resize
your perm gen larger.

-- ramki

On Wed, Nov 7, 2012 at 6:47 PM, Bernd Eckenfels <bernd-2012 at eckenfels.net>wrote:

> I reported the same problem - and it happened again on this particular
> system - always after a week of normal operation.
>
> The time critical processes have meanwhile be migrated away so we did not
> need to restart the VM unlike before: What I can observe is that once the
> erratic behaviour starts it will stay in that mode for days.
>
> PermGen and CodeCache have been sometimes close to the trigger threshold,
> but in other situations below. My final hope is to go with the
> InitiatingOccupancyOnly (but it does take some time to get that into
> production).
>
> Bernd
>
> --
> bernd.eckenfels.net
>
> Am 08.11.2012 um 02:36 schrieb Rainer Jung <rainer.jung at kippdata.de>:
>
> > Replying to self: Reading some similar older threads I found a seemingly
> unrelated hint about a full code cache. Indeed I can find the messages
> about the full code cache. But:
> >
> > - the frequent CMS runs happen before the code cache gets filled
> >
> > - even if the bigger part of the performance problems might be coming
> from the full code cache I'd still be interested to understand, why the CMS
> runs so frequently.
> >
> > Thanks!
> >
> > Rainer
> >
> > On 08.11.2012 01:54, Rainer Jung wrote:
> >> We have an application running that after some time starts to do very
> >> frequent CMS without any reason I can see. Here's a snippet from a GC
> >> log where soon after starting the CMS a young generation collection
> >> happens, so that we can easily see the amount of memory currently in
> use:
> >>
> >> 06.11.2012 10:48:31 65102.013: [GC [1 CMS-initial-mark:
> >> 1461672K(2469888K)] 2026034K(3125248K), 0.6163870 secs] [Times:
> >> user=0.62 sys=0.00, real=0.62 secs]
> >> 06.11.2012 10:48:31 Total time for which application threads were
> >> stopped: 0.6173730 seconds
> >> 06.11.2012 10:48:31 65102.630: [CMS-concurrent-mark-start]
> >> 06.11.2012 10:48:31 Application time: 0.0012740 seconds
> >> 06.11.2012 10:48:31 {Heap before GC invocations=912 (full 39):
> >> 06.11.2012 10:48:31  par new generation   total 655360K, used 564361K
> >> [0x0000000707c00000, 0x0000000737c00000, 0x0000000737c00000)
> >> 06.11.2012 10:48:31   eden space 524288K, 100% used [0x0000000707c00000,
> >> 0x0000000727c00000, 0x0000000727c00000)
> >> 06.11.2012 10:48:31   from space 131072K,  30% used [0x000000072fc00000,
> >> 0x0000000732322670, 0x0000000737c00000)
> >> 06.11.2012 10:48:31   to   space 131072K,   0% used [0x0000000727c00000,
> >> 0x0000000727c00000, 0x000000072fc00000)
> >> 06.11.2012 10:48:31  concurrent mark-sweep generation total 2469888K,
> >> used 1461672K [0x0000000737c00000, 0x00000007ce800000,
> 0x00000007ce800000)
> >> 06.11.2012 10:48:31  concurrent-mark-sweep perm gen total 811008K, used
> >> 746784K [0x00000007ce800000, 0x0000000800000000, 0x0000000800000000)
> >> 06.11.2012 10:48:31 65102.633: [GC 65102.633: [ParNew
> >> 06.11.2012 10:48:31 Desired survivor size 67108864 bytes, new threshold
> >> 15 (max 15)
> >> 06.11.2012 10:48:31 - age   1:    5295680 bytes,    5295680 total
> >> 06.11.2012 10:48:31 - age   2:    2371208 bytes,    7666888 total
> >> 06.11.2012 10:48:31 - age   3:     798848 bytes,    8465736 total
> >> 06.11.2012 10:48:31 - age   4:     989720 bytes,    9455456 total
> >> 06.11.2012 10:48:31 - age   5:    1837168 bytes,   11292624 total
> >> 06.11.2012 10:48:31 - age   6:     447072 bytes,   11739696 total
> >> 06.11.2012 10:48:31 - age   7:     868304 bytes,   12608000 total
> >> 06.11.2012 10:48:31 - age   8:    1673520 bytes,   14281520 total
> >> 06.11.2012 10:48:31 - age   9:    3109384 bytes,   17390904 total
> >> 06.11.2012 10:48:31 - age  10:    1724424 bytes,   19115328 total
> >> 06.11.2012 10:48:31 - age  11:    1206352 bytes,   20321680 total
> >> 06.11.2012 10:48:31 - age  12:     908944 bytes,   21230624 total
> >> 06.11.2012 10:48:31 - age  13:    1772568 bytes,   23003192 total
> >> 06.11.2012 10:48:31 - age  14:    1310648 bytes,   24313840 total
> >> 06.11.2012 10:48:31 - age  15:     867560 bytes,   25181400 total
> >> 06.11.2012 10:48:31 : 564361K->30541K(655360K), 0.0292050 secs]
> >> 2026034K->1493398K(3125248K), 0.0293500 secs] [Times: user=0.21
> >> sys=0.00, real=0.03 secs]
> >> 06.11.2012 10:48:31 Heap after GC invocations=913 (full 39):
> >> 06.11.2012 10:48:31  par new generation   total 655360K, used 30541K
> >> [0x0000000707c00000, 0x0000000737c00000, 0x0000000737c00000)
> >> 06.11.2012 10:48:31   eden space 524288K,   0% used [0x0000000707c00000,
> >> 0x0000000707c00000, 0x0000000727c00000)
> >> 06.11.2012 10:48:31   from space 131072K,  23% used [0x0000000727c00000,
> >> 0x00000007299d35f0, 0x000000072fc00000)
> >> 06.11.2012 10:48:31   to   space 131072K,   0% used [0x000000072fc00000,
> >> 0x000000072fc00000, 0x0000000737c00000)
> >> 06.11.2012 10:48:31  concurrent mark-sweep generation total 2469888K,
> >> used 1462856K [0x0000000737c00000, 0x00000007ce800000,
> 0x00000007ce800000)
> >> 06.11.2012 10:48:31  concurrent-mark-sweep perm gen total 811008K, used
> >> 746784K [0x00000007ce800000, 0x0000000800000000, 0x0000000800000000)
> >> 06.11.2012 10:48:31 }
> >>
> >> Since neither tenured nor perm are nearly full I don't see a reason for
> >> this CMS. After some time CMS happens very frequently, multiple times
> >> per minute until finally it runs more or less continuously and the JVM
> >> needs to be restarted.
> >>
> >> JVM flags are:
> >>
> >> argv[0]: /opt/java/jdk1.6.0_35/bin/java
> >> ... application system property ...
> >> argv[2]: -server
> >> argv[3]: -Dsun.rmi.dgc.client.gcInterval=28800000
> >> ... application system property ...
> >> argv[5]: -XX:+UseParNewGC
> >> argv[6]: -XX:ParallelGCThreads=8
> >> argv[7]: -XX:+UseConcMarkSweepGC
> >> argv[8]: -XX:SurvivorRatio=4
> >> argv[9]: -XX:+CMSClassUnloadingEnabled
> >> argv[10]: -XX:CMSInitiatingOccupancyFraction=80
> >> argv[11]: -XX:+UseCMSInitiatingOccupancyOnly
> >> argv[12]: -Dsun.reflect.noInflation=true
> >> argv[13]: -XX:+TraceClassLoading
> >> argv[14]: -verbose:gc
> >> argv[15]: -XX:+PrintGCApplicationConcurrentTime
> >> argv[16]: -XX:+PrintGCApplicationStoppedTime
> >> argv[17]: -XX:+PrintGCTimeStamps
> >> argv[18]: -XX:+PrintGCDetails
> >> argv[19]: -XX:+PrintHeapAtGC
> >> argv[20]: -XX:+PrintTenuringDistribution
> >> argv[21]: -Xms3052m
> >> argv[22]: -Xmx3052m
> >> argv[23]: -XX:PermSize=792m
> >> argv[24]: -XX:MaxPermSize=792m
> >> argv[25]: -XX:NewSize=640m
> >> argv[26]: -XX:MaxNewSize=640m
> >> ... application system properties ...
> >> argv[38]: -Dcom.sun.management.jmxremote
> >> argv[39]: -Dcom.sun.management.jmxremote.port=1090
> >> argv[40]: -Dcom.sun.management.jmxremote.ssl=false
> >> argv[41]: -Dcom.sun.management.jmxremote.authenticate=true
> >> argv[42]:
> >> -Dcom.sun.management.jmxremote.password.file=/.../jmxremote.password
> >> argv[43]:
> -Dcom.sun.management.jmxremote.access.file=/.../jmxremote.access
> >> argv[44]: -XX:-UseBiasedLocking
> >> argv[45]: -XX:-EliminateLocks
> >> argv[46]: -XX:-DoEscapeAnalysis
> >>
> >> - There are no messages about promotion failures or concurrent mode
> >> failures in the logs
> >>
> >> - System.gc() would produce different log messages
> >>
> >> - DGC would not run in intervals that get shorter and shorter down to
> >> seconds
> >>
> >> - perm is far from being full
> >>
> >> - tenured is well below the Occupancy Fraction (always below 60%)
> >>
> >> How can we find out the CMS cause here?
> >>
> >> Note: args 44-46 are there because of historical reasons (load test
> >> results). Even if for current JVM releases it might be better to remove
> >> them, I don't see how they would explain frequent CMS runs.
> >>
> >> Thanks for any hint!
> >>
> >> Regards,
> >>
> >> Rainer
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20121107/e1e8c0ef/attachment.htm>