From Dori.Rabin at Starhome.com Thu Nov 4 06:27:57 2010 From: Dori.Rabin at Starhome.com (Rabin Dori) Date: Thu, 4 Nov 2010 15:27:57 +0200 Subject: problem in gc with incremental mode Message-ID: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local> Hi, Once in a while and for a reason I cannot understand the CMS kicks up too late which causes a promotion failure and full GC that takes very long (more than 2 minutes which causes other problems)... My question is how to tune the gc flags in order to make sure that the concurrent sweep will always occur in parallel (incremental mode) without long pause stop the world but also without reaching its maximum capacity ? (I know that in my case the CMSInitiatingOccupancyFraction=60 is ignored because of the CMSIncrementalMode And from looking in the log file we can see that the old generation reaches size of 835'959K out of 843'000K at the time the concurrent failure (I marked this line in red) I am running the jvm with the following parameters : wrapper.java.additional.4=-XX:NewSize=200m wrapper.java.additional.5=-XX:SurvivorRatio=6 wrapper.java.additional.6=-XX:MaxTenuringThreshold=4 wrapper.java.additional.7=-XX:+CMSIncrementalMode wrapper.java.additional.8=-XX:+CMSIncrementalPacing wrapper.java.additional.9=-XX:+DisableExplicitGC wrapper.java.additional.10=-XX:+UseConcMarkSweepGC wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled wrapper.java.additional.12=-XX:+PrintGCDetails wrapper.java.additional.13=-XX:+PrintGCTimeStamps wrapper.java.additional.14=-XX:-TraceClassUnloading wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError wrapper.java.additional.16=-verbose:gc wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60 wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly wrapper.java.additional.19=-XX:+PrintTenuringDistribution Extracts from the log file: INFO | jvm 1 | 2010/11/02 04:54:33 | Desired survivor size 13107200 bytes, new threshold 4 (max 4) INFO | jvm 1 | 2010/11/02 04:54:33 | - age 1: 544000 bytes, 544000 total INFO | jvm 1 | 2010/11/02 04:54:33 | - age 2: 346320 bytes, 890320 total INFO | jvm 1 | 2010/11/02 04:54:33 | - age 3: 262800 bytes, 1153120 total INFO | jvm 1 | 2010/11/02 04:54:33 | - age 4: 238528 bytes, 1391648 total INFO | jvm 1 | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 secs] [Times: user=0.00 sys=0.00, real=0.11 secs] INFO | jvm 1 | 2010/11/02 04:55:54 | 422097.583: [GC 422097.583: [ParNew INFO | jvm 1 | 2010/11/02 04:55:54 | INFO | jvm 1 | 2010/11/02 04:55:54 | Desired survivor size 13107200 bytes, new threshold 4 (max 4) INFO | jvm 1 | 2010/11/02 04:55:54 | - age 1: 577104 bytes, 577104 total INFO | jvm 1 | 2010/11/02 04:55:54 | - age 2: 261856 bytes, 838960 total INFO | jvm 1 | 2010/11/02 04:55:54 | - age 3: 298832 bytes, 1137792 total INFO | jvm 1 | 2010/11/02 04:55:54 | - age 4: 259176 bytes, 1396968 total INFO | jvm 1 | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 secs] [Times: user=0.00 sys=0.00, real=0.05 secs] INFO | jvm 1 | 2010/11/02 04:57:27 | 422190.993: [GC 422190.993: [ParNew INFO | jvm 1 | 2010/11/02 04:57:28 | INFO | jvm 1 | 2010/11/02 04:57:28 | Desired survivor size 13107200 bytes, new threshold 4 (max 4) INFO | jvm 1 | 2010/11/02 04:57:28 | - age 1: 676656 bytes, 676656 total INFO | jvm 1 | 2010/11/02 04:57:28 | - age 2: 283376 bytes, 960032 total INFO | jvm 1 | 2010/11/02 04:57:28 | - age 3: 239472 bytes, 1199504 total INFO | jvm 1 | 2010/11/02 04:57:28 | - age 4: 264960 bytes, 1464464 total INFO | jvm 1 | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 secs] [Times: user=0.01 sys=0.00, real=0.07 secs] INFO | jvm 1 | 2010/11/02 04:58:54 | 422277.406: [GC 422277.406: [ParNew INFO | jvm 1 | 2010/11/02 04:58:54 | Desired survivor size 13107200 bytes, new threshold 4 (max 4) INFO | jvm 1 | 2010/11/02 04:58:54 | - age 1: 615944 bytes, 615944 total INFO | jvm 1 | 2010/11/02 04:58:54 | - age 2: 334120 bytes, 950064 total INFO | jvm 1 | 2010/11/02 04:58:54 | - age 3: 276736 bytes, 1226800 total INFO | jvm 1 | 2010/11/02 04:58:54 | - age 4: 236424 bytes, 1463224 total INFO | jvm 1 | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 secs] [Times: user=0.00 sys=0.00, real=0.04 secs] INFO | jvm 1 | 2010/11/02 05:00:23 | 422366.439: [GC 422366.439: [ParNew INFO | jvm 1 | 2010/11/02 05:00:23 | (promotion failed) INFO | jvm 1 | 2010/11/02 05:00:23 | Desired survivor size 13107200 bytes, new threshold 4 (max 4) INFO | jvm 1 | 2010/11/02 05:00:23 | - age 1: 574000 bytes, 574000 total INFO | jvm 1 | 2010/11/02 05:00:23 | - age 2: 315432 bytes, 889432 total INFO | jvm 1 | 2010/11/02 05:00:23 | - age 3: 281216 bytes, 1170648 total INFO | jvm 1 | 2010/11/02 05:00:23 | - age 4: 271776 bytes, 1442424 total INFO | jvm 1 | 2010/11/02 05:00:23 | : 155528K->155689K(179200K), 0.1007840 secs]422366.540: [CMS INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor121] INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor119] INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor124] INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor123] INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor120] INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor122] ERROR | wrapper | 2010/11/02 05:02:37 | JVM appears hung: Timed out waiting for signal from JVM. Dori Rabin [cid:image001.gif at 01CB7C2F.2582B020] [cid:image002.jpg at 01CB7C2F.2582B020] T. +972-3-123-4567 F. +972-3- 766-3559 M. +972-54- 4232-706 Email: mailto:Dori.Rabin at starhome.com [cid:image003.gif at 01CB7C2F.2582B020] [cid:image004.gif at 01CB7C2F.2582B020] [cid:image005.gif at 01CB7C2F.2582B020] [cid:image006.gif at 01CB7C2F.2582B020] [cid:image007.gif at 01CB7C2F.2582B020] This email contains proprietary and/or confidential information of Starhome. If you have received this email in error, please delete all copies without delay and do not copy, distribute, or rely on any information contained in this email. Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2143 bytes Desc: image001.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0006.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 909 bytes Desc: image002.jpg Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0001.jpg -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 470 bytes Desc: image003.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0007.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.gif Type: image/gif Size: 480 bytes Desc: image004.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0008.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.gif Type: image/gif Size: 427 bytes Desc: image005.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0009.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.gif Type: image/gif Size: 397 bytes Desc: image006.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0010.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.gif Type: image/gif Size: 422 bytes Desc: image007.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0011.gif From y.s.ramakrishna at oracle.com Thu Nov 4 09:51:39 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Thu, 04 Nov 2010 09:51:39 -0700 Subject: problem in gc with incremental mode In-Reply-To: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local> References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local> Message-ID: <4CD2E49B.1080008@oracle.com> Hi Dori -- What's the version of JDK you are running? Can you share a complete log? It appears as though the iCMS "auto-pacing" is, for some reason, not kicking in "soon enough"; one workaround is to use turn off auto-pacing and use an explicit duty-cycle (which has its own disadvantages). I'd suggest filing a bug, and including a complete log file showing the problem. thanks. -- ramki On 11/04/10 06:27, Rabin Dori wrote: > Hi, > > Once in a while and for a reason I cannot understand the CMS kicks up > too late which causes a promotion failure and full GC that takes very > long (more than 2 minutes which causes other problems)? > > My question is how to tune the gc flags in order to make sure that the > concurrent sweep will always occur in parallel (incremental mode) > without long pause stop the world but also without reaching its maximum > capacity ? > > > > (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is > ignored because of the CMSIncrementalMode > > And from looking in the log file we can see that the old generation > reaches size of 835?959K out of 843?000K at the time the concurrent > failure (I marked this line in red) > > > > *_I am running the jvm with the following parameters :_* > > wrapper.java.additional.4=-XX:NewSize=200m > > wrapper.java.additional.5=-XX:SurvivorRatio=6 > > wrapper.java.additional.6=-XX:MaxTenuringThreshold=4 > > wrapper.java.additional.7=-XX:+CMSIncrementalMode > > wrapper.java.additional.8=-XX:+CMSIncrementalPacing > > wrapper.java.additional.9=-XX:+DisableExplicitGC > > wrapper.java.additional.10=-XX:+UseConcMarkSweepGC > > wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled > > wrapper.java.additional.12=-XX:+PrintGCDetails > > wrapper.java.additional.13=-XX:+PrintGCTimeStamps > > wrapper.java.additional.14=-XX:-TraceClassUnloading > > wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError > > wrapper.java.additional.16=-verbose:gc > > wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60 > > wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly > > wrapper.java.additional.19=-XX:+PrintTenuringDistribution > > > > > > *_Extracts from the log file:_* > > INFO | jvm 1 | 2010/11/02 04:54:33 | Desired survivor size 13107200 > bytes, new threshold 4 (max 4) > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 1: 544000 > bytes, 544000 total > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 2: 346320 > bytes, 890320 total > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 3: 262800 > bytes, 1153120 total > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 4: 238528 > bytes, 1391648 total > > INFO | jvm 1 | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), > 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 secs] > [Times: user=0.00 sys=0.00, real=0.11 secs] > > INFO | jvm 1 | 2010/11/02 04:55:54 | 422097.583: [GC 422097.583: > [ParNew > > INFO | jvm 1 | 2010/11/02 04:55:54 | > > INFO | jvm 1 | 2010/11/02 04:55:54 | Desired survivor size 13107200 > bytes, new threshold 4 (max 4) > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 1: 577104 > bytes, 577104 total > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 2: 261856 > bytes, 838960 total > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 3: 298832 > bytes, 1137792 total > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 4: 259176 > bytes, 1396968 total > > INFO | jvm 1 | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), > 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 secs] > [Times: user=0.00 sys=0.00, real=0.05 secs] > > INFO | jvm 1 | 2010/11/02 04:57:27 | 422190.993: [GC 422190.993: > [ParNew > > INFO | jvm 1 | 2010/11/02 04:57:28 | > > INFO | jvm 1 | 2010/11/02 04:57:28 | Desired survivor size 13107200 > bytes, new threshold 4 (max 4) > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 1: 676656 > bytes, 676656 total > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 2: 283376 > bytes, 960032 total > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 3: 239472 > bytes, 1199504 total > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 4: 264960 > bytes, 1464464 total > > INFO | jvm 1 | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), > 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 secs] > [Times: user=0.01 sys=0.00, real=0.07 secs] > > INFO | jvm 1 | 2010/11/02 04:58:54 | 422277.406: [GC 422277.406: > [ParNew > > INFO | jvm 1 | 2010/11/02 04:58:54 | Desired survivor size 13107200 > bytes, new threshold 4 (max 4) > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 1: 615944 > bytes, 615944 total > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 2: 334120 > bytes, 950064 total > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 3: 276736 > bytes, 1226800 total > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 4: 236424 > bytes, 1463224 total > > INFO | jvm 1 | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), > 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 secs] > [Times: user=0.00 sys=0.00, real=0.04 secs] > > INFO | jvm 1 | 2010/11/02 05:00:23 | 422366.439: [GC 422366.439: > [ParNew > > INFO | jvm 1 | 2010/11/02 05:00:23 | (promotion failed) > > INFO | jvm 1 | 2010/11/02 05:00:23 | Desired survivor size 13107200 > bytes, new threshold 4 (max 4) > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 1: 574000 > bytes, 574000 total > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 2: 315432 > bytes, 889432 total > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 3: 281216 > bytes, 1170648 total > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 4: 271776 > bytes, 1442424 total > > INFO | jvm 1 | 2010/11/02 05:00:23 | : 155528K->155689K(179200K), > 0.1007840 secs]422366.540: [CMS > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor121] > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor119] > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor124] > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor123] > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor120] > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor122] > > ERROR | wrapper | 2010/11/02 05:02:37 | JVM appears hung: Timed out > waiting for signal from JVM. > > > > > > *Dori Rabin* > > *cid:image001.gif at 01CB69E7.E5E45760* > > > > *cid:image002.jpg at 01CB69E7.E5E45760* > > T. +972-3-123-4567 F. +972-3- 766-3559 M. +972-54- 4232-706 > > Email: mailto:Dori .Rabin at starhome.com > > > > > > *cid:image003.gif at 01CB69E7.E5E45760* > *cid:image004.gif at 01CB69E7.E5E45760* > *cid:image005.gif at 01CB69E7.E5E45760* > *cid:image006.gif at 01CB69E7.E5E45760* > *cid:image007.gif at 01CB69E7.E5E45760* > > > This email contains proprietary and/or confidential information of > Starhome. If you > > have received this email in error, please delete all copies without > delay and do not > > copy, distribute, or rely on any information contained in this email. > Thank you! > > > > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From y.s.ramakrishna at oracle.com Thu Nov 4 09:54:52 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Thu, 04 Nov 2010 09:54:52 -0700 Subject: problem in gc with incremental mode In-Reply-To: <4CD2E49B.1080008@oracle.com> References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local> <4CD2E49B.1080008@oracle.com> Message-ID: <4CD2E55C.6060505@oracle.com> Also, consider whether you really need to use iCMS, or you could make do with regular CMS (if you have sufficiently many cores) and can scale back the # parallel threads used by CMS marking to reduce the impact on mutators. This can often be a more suitable configuration for multi-core platforms than the use of iCMS (the latter because of the way it paces itself, can often carry more floating garbage than regular CMS where the cycle starts and completes more quickly). -- ramki On 11/04/10 09:51, Y. S. Ramakrishna wrote: > Hi Dori -- > > What's the version of JDK you are running? Can you share a complete log? > It appears as though the iCMS "auto-pacing" is, for some reason, not > kicking in "soon enough"; one workaround is to use turn off auto-pacing > and use an explicit duty-cycle (which has its own disadvantages). > > I'd suggest filing a bug, and including a complete log file showing > the problem. > > thanks. > -- ramki > > On 11/04/10 06:27, Rabin Dori wrote: >> Hi, >> >> Once in a while and for a reason I cannot understand the CMS kicks up >> too late which causes a promotion failure and full GC that takes very >> long (more than 2 minutes which causes other problems)? >> >> My question is how to tune the gc flags in order to make sure that the >> concurrent sweep will always occur in parallel (incremental mode) >> without long pause stop the world but also without reaching its >> maximum capacity ? >> >> >> >> (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is >> ignored because of the CMSIncrementalMode >> >> And from looking in the log file we can see that the old generation >> reaches size of 835?959K out of 843?000K at the time the concurrent >> failure (I marked this line in red) >> >> >> >> *_I am running the jvm with the following parameters :_* >> >> wrapper.java.additional.4=-XX:NewSize=200m >> >> wrapper.java.additional.5=-XX:SurvivorRatio=6 >> >> wrapper.java.additional.6=-XX:MaxTenuringThreshold=4 >> >> wrapper.java.additional.7=-XX:+CMSIncrementalMode >> >> wrapper.java.additional.8=-XX:+CMSIncrementalPacing >> >> wrapper.java.additional.9=-XX:+DisableExplicitGC >> >> wrapper.java.additional.10=-XX:+UseConcMarkSweepGC >> >> wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled >> >> wrapper.java.additional.12=-XX:+PrintGCDetails >> >> wrapper.java.additional.13=-XX:+PrintGCTimeStamps >> >> wrapper.java.additional.14=-XX:-TraceClassUnloading >> >> wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError >> >> wrapper.java.additional.16=-verbose:gc >> >> wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60 >> >> wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly >> >> wrapper.java.additional.19=-XX:+PrintTenuringDistribution >> >> >> >> >> >> *_Extracts from the log file:_* >> >> INFO | jvm 1 | 2010/11/02 04:54:33 | Desired survivor size >> 13107200 bytes, new threshold 4 (max 4) >> >> INFO | jvm 1 | 2010/11/02 04:54:33 | - age 1: 544000 >> bytes, 544000 total >> >> INFO | jvm 1 | 2010/11/02 04:54:33 | - age 2: 346320 >> bytes, 890320 total >> >> INFO | jvm 1 | 2010/11/02 04:54:33 | - age 3: 262800 >> bytes, 1153120 total >> >> INFO | jvm 1 | 2010/11/02 04:54:33 | - age 4: 238528 >> bytes, 1391648 total >> >> INFO | jvm 1 | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), >> 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 secs] >> [Times: user=0.00 sys=0.00, real=0.11 secs] >> >> INFO | jvm 1 | 2010/11/02 04:55:54 | 422097.583: [GC 422097.583: >> [ParNew >> >> INFO | jvm 1 | 2010/11/02 04:55:54 | >> >> INFO | jvm 1 | 2010/11/02 04:55:54 | Desired survivor size >> 13107200 bytes, new threshold 4 (max 4) >> >> INFO | jvm 1 | 2010/11/02 04:55:54 | - age 1: 577104 >> bytes, 577104 total >> >> INFO | jvm 1 | 2010/11/02 04:55:54 | - age 2: 261856 >> bytes, 838960 total >> >> INFO | jvm 1 | 2010/11/02 04:55:54 | - age 3: 298832 >> bytes, 1137792 total >> >> INFO | jvm 1 | 2010/11/02 04:55:54 | - age 4: 259176 >> bytes, 1396968 total >> >> INFO | jvm 1 | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), >> 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 secs] >> [Times: user=0.00 sys=0.00, real=0.05 secs] >> >> INFO | jvm 1 | 2010/11/02 04:57:27 | 422190.993: [GC 422190.993: >> [ParNew >> >> INFO | jvm 1 | 2010/11/02 04:57:28 | >> >> INFO | jvm 1 | 2010/11/02 04:57:28 | Desired survivor size >> 13107200 bytes, new threshold 4 (max 4) >> >> INFO | jvm 1 | 2010/11/02 04:57:28 | - age 1: 676656 >> bytes, 676656 total >> >> INFO | jvm 1 | 2010/11/02 04:57:28 | - age 2: 283376 >> bytes, 960032 total >> >> INFO | jvm 1 | 2010/11/02 04:57:28 | - age 3: 239472 >> bytes, 1199504 total >> >> INFO | jvm 1 | 2010/11/02 04:57:28 | - age 4: 264960 >> bytes, 1464464 total >> >> INFO | jvm 1 | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), >> 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 secs] >> [Times: user=0.01 sys=0.00, real=0.07 secs] >> >> INFO | jvm 1 | 2010/11/02 04:58:54 | 422277.406: [GC 422277.406: >> [ParNew >> >> INFO | jvm 1 | 2010/11/02 04:58:54 | Desired survivor size >> 13107200 bytes, new threshold 4 (max 4) >> >> INFO | jvm 1 | 2010/11/02 04:58:54 | - age 1: 615944 >> bytes, 615944 total >> >> INFO | jvm 1 | 2010/11/02 04:58:54 | - age 2: 334120 >> bytes, 950064 total >> >> INFO | jvm 1 | 2010/11/02 04:58:54 | - age 3: 276736 >> bytes, 1226800 total >> >> INFO | jvm 1 | 2010/11/02 04:58:54 | - age 4: 236424 >> bytes, 1463224 total >> >> INFO | jvm 1 | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), >> 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 secs] >> [Times: user=0.00 sys=0.00, real=0.04 secs] >> >> INFO | jvm 1 | 2010/11/02 05:00:23 | 422366.439: [GC 422366.439: >> [ParNew >> >> INFO | jvm 1 | 2010/11/02 05:00:23 | (promotion failed) >> >> INFO | jvm 1 | 2010/11/02 05:00:23 | Desired survivor size >> 13107200 bytes, new threshold 4 (max 4) >> >> INFO | jvm 1 | 2010/11/02 05:00:23 | - age 1: 574000 >> bytes, 574000 total >> >> INFO | jvm 1 | 2010/11/02 05:00:23 | - age 2: 315432 >> bytes, 889432 total >> >> INFO | jvm 1 | 2010/11/02 05:00:23 | - age 3: 281216 >> bytes, 1170648 total >> >> INFO | jvm 1 | 2010/11/02 05:00:23 | - age 4: 271776 >> bytes, 1442424 total >> >> INFO | jvm 1 | 2010/11/02 05:00:23 | : 155528K->155689K(179200K), >> 0.1007840 secs]422366.540: [CMS >> >> INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> sun.reflect.GeneratedMethodAccessor121] >> >> INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> sun.reflect.GeneratedMethodAccessor119] >> >> INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> sun.reflect.GeneratedMethodAccessor124] >> >> INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> sun.reflect.GeneratedMethodAccessor123] >> >> INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> sun.reflect.GeneratedMethodAccessor120] >> >> INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> sun.reflect.GeneratedMethodAccessor122] >> >> ERROR | wrapper | 2010/11/02 05:02:37 | JVM appears hung: Timed out >> waiting for signal from JVM. >> >> >> >> >> >> *Dori Rabin* >> >> *cid:image001.gif at 01CB69E7.E5E45760* >> >> >> >> *cid:image002.jpg at 01CB69E7.E5E45760* >> >> T. +972-3-123-4567 F. +972-3- 766-3559 M. +972-54- 4232-706 >> >> Email: mailto:Dori .Rabin at starhome.com >> >> >> >> >> >> *cid:image003.gif at 01CB69E7.E5E45760* >> *cid:image004.gif at 01CB69E7.E5E45760* >> *cid:image005.gif at 01CB69E7.E5E45760* >> *cid:image006.gif at 01CB69E7.E5E45760* >> *cid:image007.gif at 01CB69E7.E5E45760* >> This email contains proprietary and/or confidential information of >> Starhome. If you >> >> have received this email in error, please delete all copies without >> delay and do not >> >> copy, distribute, or rely on any information contained in this email. >> Thank you! >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From jon.masamitsu at oracle.com Thu Nov 4 10:49:01 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Thu, 04 Nov 2010 10:49:01 -0700 Subject: problem in gc with incremental mode In-Reply-To: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local> References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local> Message-ID: <4CD2F20D.3040506@oracle.com> Which version of the JDK are you using and what type of platform are you running on? On 11/04/10 06:27, Rabin Dori wrote: > > Hi, > > Once in a while and for a reason I cannot understand the CMS kicks up > too late which causes a promotion failure and full GC that takes very > long (more than 2 minutes which causes other problems)... > > My question is how to tune the gc flags in order to make sure that the > concurrent sweep will always occur in parallel (incremental mode) > without long pause stop the world but also without reaching its > maximum capacity ? > > > > (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is > ignored because of the CMSIncrementalMode > > And from looking in the log file we can see that the old generation > reaches size of 835'959K out of 843'000K at the time the concurrent > failure (I marked this line in red) > > > > *_I am running the jvm with the following parameters :_* > > wrapper.java.additional.4=-XX:NewSize=200m > > wrapper.java.additional.5=-XX:SurvivorRatio=6 > > wrapper.java.additional.6=-XX:MaxTenuringThreshold=4 > > wrapper.java.additional.7=-XX:+CMSIncrementalMode > > wrapper.java.additional.8=-XX:+CMSIncrementalPacing > > wrapper.java.additional.9=-XX:+DisableExplicitGC > > wrapper.java.additional.10=-XX:+UseConcMarkSweepGC > > wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled > > wrapper.java.additional.12=-XX:+PrintGCDetails > > wrapper.java.additional.13=-XX:+PrintGCTimeStamps > > wrapper.java.additional.14=-XX:-TraceClassUnloading > > wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError > > wrapper.java.additional.16=-verbose:gc > > wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60 > > wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly > > wrapper.java.additional.19=-XX:+PrintTenuringDistribution > > > > > > *_Extracts from the log file:_* > > INFO | jvm 1 | 2010/11/02 04:54:33 | Desired survivor size > 13107200 bytes, new threshold 4 (max 4) > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 1: 544000 > bytes, 544000 total > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 2: 346320 > bytes, 890320 total > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 3: 262800 > bytes, 1153120 total > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 4: 238528 > bytes, 1391648 total > > INFO | jvm 1 | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), > 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 secs] > [Times: user=0.00 sys=0.00, real=0.11 secs] > > INFO | jvm 1 | 2010/11/02 04:55:54 | 422097.583: [GC 422097.583: > [ParNew > > INFO | jvm 1 | 2010/11/02 04:55:54 | > > INFO | jvm 1 | 2010/11/02 04:55:54 | Desired survivor size > 13107200 bytes, new threshold 4 (max 4) > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 1: 577104 > bytes, 577104 total > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 2: 261856 > bytes, 838960 total > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 3: 298832 > bytes, 1137792 total > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 4: 259176 > bytes, 1396968 total > > INFO | jvm 1 | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), > 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 secs] > [Times: user=0.00 sys=0.00, real=0.05 secs] > > INFO | jvm 1 | 2010/11/02 04:57:27 | 422190.993: [GC 422190.993: > [ParNew > > INFO | jvm 1 | 2010/11/02 04:57:28 | > > INFO | jvm 1 | 2010/11/02 04:57:28 | Desired survivor size > 13107200 bytes, new threshold 4 (max 4) > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 1: 676656 > bytes, 676656 total > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 2: 283376 > bytes, 960032 total > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 3: 239472 > bytes, 1199504 total > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 4: 264960 > bytes, 1464464 total > > INFO | jvm 1 | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), > 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 secs] > [Times: user=0.01 sys=0.00, real=0.07 secs] > > INFO | jvm 1 | 2010/11/02 04:58:54 | 422277.406: [GC 422277.406: > [ParNew > > INFO | jvm 1 | 2010/11/02 04:58:54 | Desired survivor size > 13107200 bytes, new threshold 4 (max 4) > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 1: 615944 > bytes, 615944 total > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 2: 334120 > bytes, 950064 total > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 3: 276736 > bytes, 1226800 total > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 4: 236424 > bytes, 1463224 total > > INFO | jvm 1 | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), > 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 secs] > [Times: user=0.00 sys=0.00, real=0.04 secs] > > INFO | jvm 1 | 2010/11/02 05:00:23 | 422366.439: [GC 422366.439: > [ParNew > > INFO | jvm 1 | 2010/11/02 05:00:23 | (promotion failed) > > INFO | jvm 1 | 2010/11/02 05:00:23 | Desired survivor size > 13107200 bytes, new threshold 4 (max 4) > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 1: 574000 > bytes, 574000 total > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 2: 315432 > bytes, 889432 total > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 3: 281216 > bytes, 1170648 total > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 4: 271776 > bytes, 1442424 total > > INFO | jvm 1 | 2010/11/02 05:00:23 | : 155528K->155689K(179200K), > 0.1007840 secs]422366.540: [CMS > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor121] > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor119] > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor124] > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor123] > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor120] > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > sun.reflect.GeneratedMethodAccessor122] > > ERROR | wrapper | 2010/11/02 05:02:37 | JVM appears hung: Timed out > waiting for signal from JVM. > > > > > > *Dori Rabin* > > *cid:image001.gif at 01CB69E7.E5E45760* > > > > *cid:image002.jpg at 01CB69E7.E5E45760* > > T. +972-3-123-4567 F. +972-3- 766-3559 M. +972-54- 4232-706 > > Email: mailto:Dori .Rabin at starhome.com > > > > > > *cid:image003.gif at 01CB69E7.E5E45760* > *cid:image004.gif at 01CB69E7.E5E45760* > *cid:image005.gif at 01CB69E7.E5E45760* > *cid:image006.gif at 01CB69E7.E5E45760* > *cid:image007.gif at 01CB69E7.E5E45760* > > > This email contains proprietary and/or confidential information of > Starhome. If you > > have received this email in error, please delete all copies without > delay and do not > > copy, distribute, or rely on any information contained in this email. > Thank you! > > > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2143 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0006.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 909 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0001.jpe -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 470 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0007.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 480 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0008.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 427 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0009.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 397 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0010.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 422 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0011.gif From Alexander.Livitz at on24.com Sun Nov 7 13:24:57 2010 From: Alexander.Livitz at on24.com (Alexander Livitz) Date: Sun, 7 Nov 2010 13:24:57 -0800 Subject: problem in gc with incremental mode In-Reply-To: <4CD2F20D.3040506@oracle.com> References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local> <4CD2F20D.3040506@oracle.com> Message-ID: <4C717D4720DE704383A5D24A4D46E1C681D2A9AFDD@P-HQEXCHANGE.on24.com> We had similar problem with promotion failures. The issue was completely solved with the following set: -Xms12g -Xmx12g -Xmn2g \ -XX:PermSize=384m \ -XX:MaxPermSize=384m \ -XX:-UseGCOverheadLimit \ -XX:+DontYieldALot \ -XX:+UseStringCache \ -XX:+DoEscapeAnalysis \ -XX:+AdjustConcurrency \ -XX:+OptimizeStringConcat \ -XX:ReservedCodeCacheSize=64m \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:SurvivorRatio=1 \ -XX:TargetSurvivorRatio=60 \ -XX:MaxTenuringThreshold=7 \ -XX:+ParallelRefProcEnabled \ -XX:ParallelGCThreads=30 \ -XX:ParallelCMSThreads=24 \ -XX:+CMSPrecleanRefLists2 \ -XX:+CMSScavengeBeforeRemark \ -XX:+CMSClassUnloadingEnabled \ -XX:CMSMaxAbortablePrecleanTime=15000 \ -XX:CMSInitiatingOccupancyFraction=65 \ Alex From: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Jon Masamitsu Sent: Thursday, November 04, 2010 10:49 AM To: Rabin Dori Cc: hotspot-gc-use at openjdk.java.net Subject: Re: problem in gc with incremental mode Which version of the JDK are you using and what type of platform are you running on? On 11/04/10 06:27, Rabin Dori wrote: Hi, Once in a while and for a reason I cannot understand the CMS kicks up too late which causes a promotion failure and full GC that takes very long (more than 2 minutes which causes other problems)... My question is how to tune the gc flags in order to make sure that the concurrent sweep will always occur in parallel (incremental mode) without long pause stop the world but also without reaching its maximum capacity ? (I know that in my case the CMSInitiatingOccupancyFraction=60 is ignored because of the CMSIncrementalMode And from looking in the log file we can see that the old generation reaches size of 835'959K out of 843'000K at the time the concurrent failure (I marked this line in red) I am running the jvm with the following parameters : wrapper.java.additional.4=-XX:NewSize=200m wrapper.java.additional.5=-XX:SurvivorRatio=6 wrapper.java.additional.6=-XX:MaxTenuringThreshold=4 wrapper.java.additional.7=-XX:+CMSIncrementalMode wrapper.java.additional.8=-XX:+CMSIncrementalPacing wrapper.java.additional.9=-XX:+DisableExplicitGC wrapper.java.additional.10=-XX:+UseConcMarkSweepGC wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled wrapper.java.additional.12=-XX:+PrintGCDetails wrapper.java.additional.13=-XX:+PrintGCTimeStamps wrapper.java.additional.14=-XX:-TraceClassUnloading wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError wrapper.java.additional.16=-verbose:gc wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60 wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly wrapper.java.additional.19=-XX:+PrintTenuringDistribution Extracts from the log file: INFO | jvm 1 | 2010/11/02 04:54:33 | Desired survivor size 13107200 bytes, new threshold 4 (max 4) INFO | jvm 1 | 2010/11/02 04:54:33 | - age 1: 544000 bytes, 544000 total INFO | jvm 1 | 2010/11/02 04:54:33 | - age 2: 346320 bytes, 890320 total INFO | jvm 1 | 2010/11/02 04:54:33 | - age 3: 262800 bytes, 1153120 total INFO | jvm 1 | 2010/11/02 04:54:33 | - age 4: 238528 bytes, 1391648 total INFO | jvm 1 | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 secs] [Times: user=0.00 sys=0.00, real=0.11 secs] INFO | jvm 1 | 2010/11/02 04:55:54 | 422097.583: [GC 422097.583: [ParNew INFO | jvm 1 | 2010/11/02 04:55:54 | INFO | jvm 1 | 2010/11/02 04:55:54 | Desired survivor size 13107200 bytes, new threshold 4 (max 4) INFO | jvm 1 | 2010/11/02 04:55:54 | - age 1: 577104 bytes, 577104 total INFO | jvm 1 | 2010/11/02 04:55:54 | - age 2: 261856 bytes, 838960 total INFO | jvm 1 | 2010/11/02 04:55:54 | - age 3: 298832 bytes, 1137792 total INFO | jvm 1 | 2010/11/02 04:55:54 | - age 4: 259176 bytes, 1396968 total INFO | jvm 1 | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 secs] [Times: user=0.00 sys=0.00, real=0.05 secs] INFO | jvm 1 | 2010/11/02 04:57:27 | 422190.993: [GC 422190.993: [ParNew INFO | jvm 1 | 2010/11/02 04:57:28 | INFO | jvm 1 | 2010/11/02 04:57:28 | Desired survivor size 13107200 bytes, new threshold 4 (max 4) INFO | jvm 1 | 2010/11/02 04:57:28 | - age 1: 676656 bytes, 676656 total INFO | jvm 1 | 2010/11/02 04:57:28 | - age 2: 283376 bytes, 960032 total INFO | jvm 1 | 2010/11/02 04:57:28 | - age 3: 239472 bytes, 1199504 total INFO | jvm 1 | 2010/11/02 04:57:28 | - age 4: 264960 bytes, 1464464 total INFO | jvm 1 | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 secs] [Times: user=0.01 sys=0.00, real=0.07 secs] INFO | jvm 1 | 2010/11/02 04:58:54 | 422277.406: [GC 422277.406: [ParNew INFO | jvm 1 | 2010/11/02 04:58:54 | Desired survivor size 13107200 bytes, new threshold 4 (max 4) INFO | jvm 1 | 2010/11/02 04:58:54 | - age 1: 615944 bytes, 615944 total INFO | jvm 1 | 2010/11/02 04:58:54 | - age 2: 334120 bytes, 950064 total INFO | jvm 1 | 2010/11/02 04:58:54 | - age 3: 276736 bytes, 1226800 total INFO | jvm 1 | 2010/11/02 04:58:54 | - age 4: 236424 bytes, 1463224 total INFO | jvm 1 | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 secs] [Times: user=0.00 sys=0.00, real=0.04 secs] INFO | jvm 1 | 2010/11/02 05:00:23 | 422366.439: [GC 422366.439: [ParNew INFO | jvm 1 | 2010/11/02 05:00:23 | (promotion failed) INFO | jvm 1 | 2010/11/02 05:00:23 | Desired survivor size 13107200 bytes, new threshold 4 (max 4) INFO | jvm 1 | 2010/11/02 05:00:23 | - age 1: 574000 bytes, 574000 total INFO | jvm 1 | 2010/11/02 05:00:23 | - age 2: 315432 bytes, 889432 total INFO | jvm 1 | 2010/11/02 05:00:23 | - age 3: 281216 bytes, 1170648 total INFO | jvm 1 | 2010/11/02 05:00:23 | - age 4: 271776 bytes, 1442424 total INFO | jvm 1 | 2010/11/02 05:00:23 | : 155528K->155689K(179200K), 0.1007840 secs]422366.540: [CMS INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor121] INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor119] INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor124] INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor123] INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor120] INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor122] ERROR | wrapper | 2010/11/02 05:02:37 | JVM appears hung: Timed out waiting for signal from JVM. Dori Rabin [cid:image001.gif at 01CB7E7F.2A4962F0] [cid:image002.jpg at 01CB7E7F.2A4962F0] T. +972-3-123-4567 F. +972-3- 766-3559 M. +972-54- 4232-706 Email: mailto:Dori.Rabin at starhome.com [cid:image003.gif at 01CB7E7F.2A4962F0] [cid:image004.gif at 01CB7E7F.2A4962F0] [cid:image005.gif at 01CB7E7F.2A4962F0] [cid:image006.gif at 01CB7E7F.2A4962F0] [cid:image007.gif at 01CB7E7F.2A4962F0] This email contains proprietary and/or confidential information of Starhome. If you have received this email in error, please delete all copies without delay and do not copy, distribute, or rely on any information contained in this email. Thank you! ________________________________ _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2143 bytes Desc: image001.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0006.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 909 bytes Desc: image002.jpg Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0001.jpg -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 470 bytes Desc: image003.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0007.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.gif Type: image/gif Size: 480 bytes Desc: image004.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0008.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.gif Type: image/gif Size: 427 bytes Desc: image005.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0009.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.gif Type: image/gif Size: 397 bytes Desc: image006.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0010.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.gif Type: image/gif Size: 422 bytes Desc: image007.gif Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0011.gif From jon.masamitsu at oracle.com Sun Nov 7 20:04:18 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Sun, 07 Nov 2010 20:04:18 -0800 Subject: problem in gc with incremental mode In-Reply-To: References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local> <4CD2E49B.1080008@oracle.com> Message-ID: <4CD776C2.3070601@oracle.com> The incremental mode of CMS was implemented for running CMS on machines with a single hardware thread. The idea was that CMS does work concurrently with the application but with a single hardware thread, when CMS ran in a concurrent phase, it would still be stopping the application (CMS would use the single hardware thread and there would be nothing for the application to run on). So the incremental mode of CMS does the concurrent phases in increments and gives up the hardware thread to the application on a regular basis. On a 4 processor machine I would not recommend the incremental mode. If you have not tried the regular CMS (no incremental mode), I recommend that you try it. Overall it is more efficient. On 11/6/10 11:32 PM, Dori Rabin wrote: > Thanks fo your replies... > I am running on jdk1.6.0_04 and OS is Linux 2.4.21-37.ELsmp, I have 4 > processors on the machine > About sharing the log file it might be a little problematic now so > let's discuss it if no other option > I also thought of getting rid of the incremental mode but I am afraid > of the effect of long pauses on our realtime application due to the > stop the world phase of CMS.... > I didn't quiet understand what exactly you meant by "scale back the # > parallel threads used by the CMS..." is there a parameter I need to > set for that ? if yes what should be its value ? > Thanks > Dori > On Thu, Nov 4, 2010 at 6:51 PM, Y. S. Ramakrishna > > wrote: > > Hi Dori -- > > What's the version of JDK you are running? Can you share a > complete log? > It appears as though the iCMS "auto-pacing" is, for some reason, not > kicking in "soon enough"; one workaround is to use turn off > auto-pacing > and use an explicit duty-cycle (which has its own disadvantages). > > I'd suggest filing a bug, and including a complete log file showing > the problem. > > thanks. > -- ramki > > On 11/04/10 06:27, Rabin Dori wrote: > > Hi, > > > > Once in a while and for a reason I cannot understand the CMS > kicks up > > too late which causes a promotion failure and full GC that takes > very > > long (more than 2 minutes which causes other problems)? > > > > My question is how to tune the gc flags in order to make sure > that the > > concurrent sweep will always occur in parallel (incremental mode) > > without long pause stop the world but also without reaching its > maximum > > capacity ? > > > > > > > > (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is > > ignored because of the CMSIncrementalMode > > > > And from looking in the log file we can see that the old generation > > reaches size of 835?959K out of 843?000K at the time the concurrent > > failure (I marked this line in red) > > > > > > > > *_I am running the jvm with the following parameters :_* > > > > wrapper.java.additional.4=-XX:NewSize=200m > > > > wrapper.java.additional.5=-XX:SurvivorRatio=6 > > > > wrapper.java.additional.6=-XX:MaxTenuringThreshold=4 > > > > wrapper.java.additional.7=-XX:+CMSIncrementalMode > > > > wrapper.java.additional.8=-XX:+CMSIncrementalPacing > > > > wrapper.java.additional.9=-XX:+DisableExplicitGC > > > > wrapper.java.additional.10=-XX:+UseConcMarkSweepGC > > > > wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled > > > > wrapper.java.additional.12=-XX:+PrintGCDetails > > > > wrapper.java.additional.13=-XX:+PrintGCTimeStamps > > > > wrapper.java.additional.14=-XX:-TraceClassUnloading > > > > wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError > > > > wrapper.java.additional.16=-verbose:gc > > > > wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60 > > > > wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly > > > > wrapper.java.additional.19=-XX:+PrintTenuringDistribution > > > > > > > > > > > > *_Extracts from the log file:_* > > > > INFO | jvm 1 | 2010/11/02 04:54:33 | Desired survivor size > 13107200 > > bytes, new threshold 4 (max 4) > > > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 1: 544000 > > bytes, 544000 total > > > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 2: 346320 > > bytes, 890320 total > > > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 3: 262800 > > bytes, 1153120 total > > > > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 4: 238528 > > bytes, 1391648 total > > > > INFO | jvm 1 | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), > > 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 > secs] > > [Times: user=0.00 sys=0.00, real=0.11 secs] > > > > INFO | jvm 1 | 2010/11/02 04:55:54 | 422097.583: [GC > 422097.583: > > [ParNew > > > > INFO | jvm 1 | 2010/11/02 04:55:54 | > > > > INFO | jvm 1 | 2010/11/02 04:55:54 | Desired survivor size > 13107200 > > bytes, new threshold 4 (max 4) > > > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 1: 577104 > > bytes, 577104 total > > > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 2: 261856 > > bytes, 838960 total > > > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 3: 298832 > > bytes, 1137792 total > > > > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 4: 259176 > > bytes, 1396968 total > > > > INFO | jvm 1 | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), > > 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 > secs] > > [Times: user=0.00 sys=0.00, real=0.05 secs] > > > > INFO | jvm 1 | 2010/11/02 04:57:27 | 422190.993: [GC > 422190.993: > > [ParNew > > > > INFO | jvm 1 | 2010/11/02 04:57:28 | > > > > INFO | jvm 1 | 2010/11/02 04:57:28 | Desired survivor size > 13107200 > > bytes, new threshold 4 (max 4) > > > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 1: 676656 > > bytes, 676656 total > > > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 2: 283376 > > bytes, 960032 total > > > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 3: 239472 > > bytes, 1199504 total > > > > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 4: 264960 > > bytes, 1464464 total > > > > INFO | jvm 1 | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), > > 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 > secs] > > [Times: user=0.01 sys=0.00, real=0.07 secs] > > > > INFO | jvm 1 | 2010/11/02 04:58:54 | 422277.406: [GC > 422277.406: > > [ParNew > > > > INFO | jvm 1 | 2010/11/02 04:58:54 | Desired survivor size > 13107200 > > bytes, new threshold 4 (max 4) > > > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 1: 615944 > > bytes, 615944 total > > > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 2: 334120 > > bytes, 950064 total > > > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 3: 276736 > > bytes, 1226800 total > > > > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 4: 236424 > > bytes, 1463224 total > > > > INFO | jvm 1 | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), > > 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 > secs] > > [Times: user=0.00 sys=0.00, real=0.04 secs] > > > > INFO | jvm 1 | 2010/11/02 05:00:23 | 422366.439: [GC > 422366.439: > > [ParNew > > > > INFO | jvm 1 | 2010/11/02 05:00:23 | (promotion failed) > > > > INFO | jvm 1 | 2010/11/02 05:00:23 | Desired survivor size > 13107200 > > bytes, new threshold 4 (max 4) > > > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 1: 574000 > > bytes, 574000 total > > > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 2: 315432 > > bytes, 889432 total > > > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 3: 281216 > > bytes, 1170648 total > > > > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 4: 271776 > > bytes, 1442424 total > > > > INFO | jvm 1 | 2010/11/02 05:00:23 | : > 155528K->155689K(179200K), > > 0.1007840 secs]422366.540: [CMS > > > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > > sun.reflect.GeneratedMethodAccessor121] > > > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > > sun.reflect.GeneratedMethodAccessor119] > > > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > > sun.reflect.GeneratedMethodAccessor124] > > > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > > sun.reflect.GeneratedMethodAccessor123] > > > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > > sun.reflect.GeneratedMethodAccessor120] > > > > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class > > sun.reflect.GeneratedMethodAccessor122] > > > > ERROR | wrapper | 2010/11/02 05:02:37 | JVM appears hung: > Timed out > > waiting for signal from JVM. > > > > > > > > > > > > *Dori Rabin* > > > > *cid:image001.gif at 01CB69E7.E5E45760* > > > > > > > > *cid:image002.jpg at 01CB69E7.E5E45760* > > > > T. +972-3-123-4567 F. +972-3- 766-3559 M. +972-54- 4232-706 > > > > Email: mailto:Dori >.Rabin at starhome.com > > > > > > > > > > > > > *cid:image003.gif at 01CB69E7.E5E45760* > > *cid:image004.gif at 01CB69E7.E5E45760* > > *cid:image005.gif at 01CB69E7.E5E45760* > > *cid:image006.gif at 01CB69E7.E5E45760* > > *cid:image007.gif at 01CB69E7.E5E45760* > > > > > > This email contains proprietary and/or confidential information of > > Starhome. If you > > > > have received this email in error, please delete all copies without > > delay and do not > > > > copy, distribute, or rely on any information contained in this > email. > > Thank you! > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/f84e83a9/attachment.html From y.s.ramakrishna at oracle.com Mon Nov 8 09:11:56 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Mon, 08 Nov 2010 09:11:56 -0800 Subject: problem in gc with incremental mode In-Reply-To: <4CD776C2.3070601@oracle.com> References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local> <4CD2E49B.1080008@oracle.com> <4CD776C2.3070601@oracle.com> Message-ID: <4CD82F5C.20806@oracle.com> I agree completely with Jon's recommendation. As regards, your question:- >>> I didn't quiet understand what exactly you meant by "scale back the # >>> parallel threads used by the CMS..." is there a parameter I need to >>> set for that ? if yes what should be its value ? On a 4-processor machine, you get a single concurrent marking thread, so you do not need to do anything specific since you are already at the minimum # concurrent worker threads used by CMS. -- ramki On 11/07/10 20:04, Jon Masamitsu wrote: > The incremental mode of CMS was implemented for running CMS on machines > with a single hardware thread. The idea was that CMS does work concurrently > with the application but with a single hardware thread, when CMS ran in > a concurrent phase, it would still be stopping the application (CMS would > use the single hardware thread and there would be nothing for the > application > to run on). So the incremental mode of CMS does the concurrent phases > in increments and gives up the hardware thread to the application on > a regular basis. On a 4 processor machine I would not recommend the > incremental > mode. If you have not tried the regular CMS (no incremental mode), > I recommend that you try it. Overall it is more efficient. > > On 11/6/10 11:32 PM, Dori Rabin wrote: >> Thanks fo your replies... >> I am running on jdk1.6.0_04 and OS is Linux 2.4.21-37.ELsmp, I have 4 >> processors on the machine >> About sharing the log file it might be a little problematic now so >> let's discuss it if no other option >> I also thought of getting rid of the incremental mode but I am afraid >> of the effect of long pauses on our realtime application due to the >> stop the world phase of CMS.... >> I didn't quiet understand what exactly you meant by "scale back the # >> parallel threads used by the CMS..." is there a parameter I need to >> set for that ? if yes what should be its value ? >> Thanks >> Dori >> >> >> On Thu, Nov 4, 2010 at 6:51 PM, Y. S. Ramakrishna >> > wrote: >> >> Hi Dori -- >> >> What's the version of JDK you are running? Can you share a >> complete log? >> It appears as though the iCMS "auto-pacing" is, for some reason, not >> kicking in "soon enough"; one workaround is to use turn off >> auto-pacing >> and use an explicit duty-cycle (which has its own disadvantages). >> >> I'd suggest filing a bug, and including a complete log file showing >> the problem. >> >> thanks. >> -- ramki >> >> On 11/04/10 06:27, Rabin Dori wrote: >> > Hi, >> > >> > Once in a while and for a reason I cannot understand the CMS >> kicks up >> > too late which causes a promotion failure and full GC that takes >> very >> > long (more than 2 minutes which causes other problems)? >> > >> > My question is how to tune the gc flags in order to make sure >> that the >> > concurrent sweep will always occur in parallel (incremental mode) >> > without long pause stop the world but also without reaching its >> maximum >> > capacity ? >> > >> > >> > >> > (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is >> > ignored because of the CMSIncrementalMode >> > >> > And from looking in the log file we can see that the old generation >> > reaches size of 835?959K out of 843?000K at the time the concurrent >> > failure (I marked this line in red) >> > >> > >> > >> > *_I am running the jvm with the following parameters :_* >> > >> > wrapper.java.additional.4=-XX:NewSize=200m >> > >> > wrapper.java.additional.5=-XX:SurvivorRatio=6 >> > >> > wrapper.java.additional.6=-XX:MaxTenuringThreshold=4 >> > >> > wrapper.java.additional.7=-XX:+CMSIncrementalMode >> > >> > wrapper.java.additional.8=-XX:+CMSIncrementalPacing >> > >> > wrapper.java.additional.9=-XX:+DisableExplicitGC >> > >> > wrapper.java.additional.10=-XX:+UseConcMarkSweepGC >> > >> > wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled >> > >> > wrapper.java.additional.12=-XX:+PrintGCDetails >> > >> > wrapper.java.additional.13=-XX:+PrintGCTimeStamps >> > >> > wrapper.java.additional.14=-XX:-TraceClassUnloading >> > >> > wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError >> > >> > wrapper.java.additional.16=-verbose:gc >> > >> > wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60 >> > >> > wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly >> > >> > wrapper.java.additional.19=-XX:+PrintTenuringDistribution >> > >> > >> > >> > >> > >> > *_Extracts from the log file:_* >> > >> > INFO | jvm 1 | 2010/11/02 04:54:33 | Desired survivor size >> 13107200 >> > bytes, new threshold 4 (max 4) >> > >> > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 1: 544000 >> > bytes, 544000 total >> > >> > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 2: 346320 >> > bytes, 890320 total >> > >> > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 3: 262800 >> > bytes, 1153120 total >> > >> > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 4: 238528 >> > bytes, 1391648 total >> > >> > INFO | jvm 1 | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), >> > 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 >> secs] >> > [Times: user=0.00 sys=0.00, real=0.11 secs] >> > >> > INFO | jvm 1 | 2010/11/02 04:55:54 | 422097.583: [GC >> 422097.583: >> > [ParNew >> > >> > INFO | jvm 1 | 2010/11/02 04:55:54 | >> > >> > INFO | jvm 1 | 2010/11/02 04:55:54 | Desired survivor size >> 13107200 >> > bytes, new threshold 4 (max 4) >> > >> > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 1: 577104 >> > bytes, 577104 total >> > >> > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 2: 261856 >> > bytes, 838960 total >> > >> > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 3: 298832 >> > bytes, 1137792 total >> > >> > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 4: 259176 >> > bytes, 1396968 total >> > >> > INFO | jvm 1 | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), >> > 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 >> secs] >> > [Times: user=0.00 sys=0.00, real=0.05 secs] >> > >> > INFO | jvm 1 | 2010/11/02 04:57:27 | 422190.993: [GC >> 422190.993: >> > [ParNew >> > >> > INFO | jvm 1 | 2010/11/02 04:57:28 | >> > >> > INFO | jvm 1 | 2010/11/02 04:57:28 | Desired survivor size >> 13107200 >> > bytes, new threshold 4 (max 4) >> > >> > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 1: 676656 >> > bytes, 676656 total >> > >> > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 2: 283376 >> > bytes, 960032 total >> > >> > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 3: 239472 >> > bytes, 1199504 total >> > >> > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 4: 264960 >> > bytes, 1464464 total >> > >> > INFO | jvm 1 | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), >> > 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 >> secs] >> > [Times: user=0.01 sys=0.00, real=0.07 secs] >> > >> > INFO | jvm 1 | 2010/11/02 04:58:54 | 422277.406: [GC >> 422277.406: >> > [ParNew >> > >> > INFO | jvm 1 | 2010/11/02 04:58:54 | Desired survivor size >> 13107200 >> > bytes, new threshold 4 (max 4) >> > >> > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 1: 615944 >> > bytes, 615944 total >> > >> > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 2: 334120 >> > bytes, 950064 total >> > >> > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 3: 276736 >> > bytes, 1226800 total >> > >> > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 4: 236424 >> > bytes, 1463224 total >> > >> > INFO | jvm 1 | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), >> > 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 >> secs] >> > [Times: user=0.00 sys=0.00, real=0.04 secs] >> > >> > INFO | jvm 1 | 2010/11/02 05:00:23 | 422366.439: [GC >> 422366.439: >> > [ParNew >> > >> > INFO | jvm 1 | 2010/11/02 05:00:23 | (promotion failed) >> > >> > INFO | jvm 1 | 2010/11/02 05:00:23 | Desired survivor size >> 13107200 >> > bytes, new threshold 4 (max 4) >> > >> > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 1: 574000 >> > bytes, 574000 total >> > >> > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 2: 315432 >> > bytes, 889432 total >> > >> > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 3: 281216 >> > bytes, 1170648 total >> > >> > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 4: 271776 >> > bytes, 1442424 total >> > >> > INFO | jvm 1 | 2010/11/02 05:00:23 | : >> 155528K->155689K(179200K), >> > 0.1007840 secs]422366.540: [CMS >> > >> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> > sun.reflect.GeneratedMethodAccessor121] >> > >> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> > sun.reflect.GeneratedMethodAccessor119] >> > >> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> > sun.reflect.GeneratedMethodAccessor124] >> > >> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> > sun.reflect.GeneratedMethodAccessor123] >> > >> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> > sun.reflect.GeneratedMethodAccessor120] >> > >> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >> > sun.reflect.GeneratedMethodAccessor122] >> > >> > ERROR | wrapper | 2010/11/02 05:02:37 | JVM appears hung: >> Timed out >> > waiting for signal from JVM. >> > >> > >> > >> > >> > >> > *Dori Rabin* >> > >> > *cid:image001.gif at 01CB69E7.E5E45760* >> > >> > >> > >> > *cid:image002.jpg at 01CB69E7.E5E45760* >> > >> > T. +972-3-123-4567 F. +972-3- 766-3559 M. +972-54- 4232-706 >> > >> > Email: mailto:Dori > >.Rabin at starhome.com >> >> > >> > >> > >> > >> > >> > *cid:image003.gif at 01CB69E7.E5E45760* >> > *cid:image004.gif at 01CB69E7.E5E45760* >> > *cid:image005.gif at 01CB69E7.E5E45760* >> > *cid:image006.gif at 01CB69E7.E5E45760* >> > *cid:image007.gif at 01CB69E7.E5E45760* >> > >> > >> > This email contains proprietary and/or confidential information of >> > Starhome. If you >> > >> > have received this email in error, please delete all copies without >> > delay and do not >> > >> > copy, distribute, or rely on any information contained in this >> email. >> > Thank you! >> > >> > >> > >> > >> > >> > >> > >> > >> > >> ------------------------------------------------------------------------ >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net >> >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > From darragh.curran at gmail.com Wed Nov 10 07:47:01 2010 From: darragh.curran at gmail.com (Darragh Curran) Date: Wed, 10 Nov 2010 15:47:01 +0000 Subject: Full GC run freeing no memory despite many unreachable objects Message-ID: Hi, I hope someone can help me understand this better. I'm running java version "Java(TM) SE Runtime Environment (build 1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)" It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m' Every few weeks a host becomes unavailable for requests. When I look into it, the java process is using 100% CPU, seems to have no running threads when I do multiple stack dumps, and based on jstat output appears to spend all it's time doing full gc runs: 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1107 5656.903 6945.644 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 I used jmap to dump the heap for analysis. When I analyse it I see that 94% of objects are unreachable, but not yet collected. >From jstat it appears that gc is running roughly every 10 seconds and lasting approx 10 seconds, but fails to free any memory. I'd really appreciate some advise on how to better understand my problem and what to do to try and fix it. Thanks, Darragh From jon.masamitsu at oracle.com Wed Nov 10 09:03:26 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 10 Nov 2010 09:03:26 -0800 Subject: Full GC run freeing no memory despite many unreachable objects In-Reply-To: References: Message-ID: <4CDAD05E.7080602@oracle.com> I assume that the Java process does not recover on it's own and has to be killed? What are the column heading for your jstat output? Turn on GC logging (if you don't already have it on) and check to see if your perm gen is full. If your perm gen is not full but the Java heap appears to be full, then the garbage collector just thinks that all that data is live. You used jmap. Do you know what's filling up the heap? On 11/10/2010 7:47 AM, Darragh Curran wrote: > Hi, > > I hope someone can help me understand this better. > > I'm running java version "Java(TM) SE Runtime Environment (build > 1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)" > > It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m' > > Every few weeks a host becomes unavailable for requests. When I look > into it, the java process is using 100% CPU, seems to have no running > threads when I do multiple stack dumps, and based on jstat output > appears to spend all it's time doing full gc runs: > > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1107 5656.903 6945.644 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 > > I used jmap to dump the heap for analysis. When I analyse it I see > that 94% of objects are unreachable, but not yet collected. > > > From jstat it appears that gc is running roughly every 10 seconds and > lasting approx 10 seconds, but fails to free any memory. > > I'd really appreciate some advise on how to better understand my > problem and what to do to try and fix it. > > Thanks, > Darragh > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From darragh.curran at gmail.com Wed Nov 10 09:22:05 2010 From: darragh.curran at gmail.com (Darragh Curran) Date: Wed, 10 Nov 2010 17:22:05 +0000 Subject: Full GC run freeing no memory despite many unreachable objects In-Reply-To: <4CDAD05E.7080602@oracle.com> References: <4CDAD05E.7080602@oracle.com> Message-ID: Ooops, Here's the jstat output with headings: S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 ... ... ... 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1107 5656.903 6945.644 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1107 5656.903 6945.644 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 It looks like the perm gen is full (~99%) The process doesn't recover on its own. Here's some other gc output from our logs: Heap PSYoungGen total 531712K, used 517887K [0x92d80000, 0xb42d0000, 0xb42d0000) eden space 517888K, 99% used [0x92d80000,0xb273fff8,0xb2740000) from space 13824K, 0% used [0xb2740000,0xb2740000,0xb34c0000) to space 14400K, 0% used [0xb34c0000,0xb34c0000,0xb42d0000) PSOldGen total 1092288K, used 1060042K [0x502d0000, 0x92d80000, 0x92d80000) object space 1092288K, 97% used [0x502d0000,0x90e02a20,0x92d80000) PSPermGen total 37376K, used 37186K [0x4c2d0000, 0x4e750000, 0x502d0000) object space 37376K, 99% used [0x4c2d0000,0x4e720b60,0x4e750000) Nov 9, 2010 9:29:48 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop executeAcceptLoop WARNING: RMI TCP Accept-0: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=36857] throws java.lang.OutOfMemoryError: GC overhead limit exceeded at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:116) at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:34) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$1.newThread(TCPTransport.java:94) at java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:672) at java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:721) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:384) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341) at java.lang.Thread.run(Thread.java:619) >From jmap I can see that most of the unreachable objects are LinkedHashMap.Entry objects that I know are created as part of many requests. Darragh On Wed, Nov 10, 2010 at 5:03 PM, Jon Masamitsu wrote: > ?I assume that the Java process does not recover on it's > own and has to be killed? > > What are the column heading for your jstat output? > > Turn on GC logging (if you don't already have it on) and check to > see if your perm gen is full. > > If your perm gen is not full but the Java heap appears to be > full, then the garbage collector just thinks that all that data > is live. ?You used jmap. ?Do you know what's filling up > the heap? > > > > On 11/10/2010 7:47 AM, Darragh Curran wrote: >> Hi, >> >> I hope someone can help me understand this better. >> >> I'm running java version "Java(TM) SE Runtime Environment (build >> 1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)" >> >> It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m' >> >> Every few weeks a host becomes unavailable for requests. When I look >> into it, the java process is using 100% CPU, seems to have no running >> threads when I do multiple stack dumps, and based on jstat output >> appears to spend all it's time doing full gc runs: >> >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1107 ?5656.903 6945.644 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >> >> I used jmap to dump the heap for analysis. When I analyse it I see >> that 94% of objects are unreachable, but not yet collected. >> >> > From jstat it appears that gc is running roughly every 10 seconds and >> lasting approx 10 seconds, but fails to free any memory. >> >> I'd really appreciate some advise on how to better understand my >> problem and what to do to try and fix it. >> >> Thanks, >> Darragh >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From y.s.ramakrishna at oracle.com Wed Nov 10 10:03:56 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Wed, 10 Nov 2010 10:03:56 -0800 Subject: Full GC run freeing no memory despite many unreachable objects In-Reply-To: References: <4CDAD05E.7080602@oracle.com> Message-ID: <4CDADE8C.6000503@oracle.com> Have you tried say doubling the size of your heap (old gen and perm gen) to see what happens to the problem? May be you are using a heap that is much too small for your application's natural footprint on the platform on which you are running it? If you believe the process heap needs should not be so large, check what the heap contains -- does a jmap -histo:live show many objects as live that you believe should have been collected? Have you tried to determine, using a tool such as jhat on these jmap heap dumps, to see if your application has a real leak after all? If you have determined that you do not have a leak, can you share a test case? (Also please first try the latest 6u23 JDK before you try to design a test case.) best regards. -- ramki On 11/10/10 09:22, Darragh Curran wrote: > Ooops, Here's the jstat output with headings: > > S0C S1C S0U S1U EC EU OC OU > PC PU YGC YGCT FGC FGCT GCT > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 > 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 > 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 > 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 > 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 > 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 > 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 > 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 > 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 > 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 > 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 > 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 > 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 > 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 > 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 > 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 > 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 > 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 > 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 > ... > ... > ... > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1107 5656.903 6945.644 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1107 5656.903 6945.644 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 > 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 > 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 > > It looks like the perm gen is full (~99%) > > The process doesn't recover on its own. Here's some other gc output > from our logs: > > > Heap > PSYoungGen total 531712K, used 517887K [0x92d80000, 0xb42d0000, > 0xb42d0000) > eden space 517888K, 99% used [0x92d80000,0xb273fff8,0xb2740000) > from space 13824K, 0% used [0xb2740000,0xb2740000,0xb34c0000) > to space 14400K, 0% used [0xb34c0000,0xb34c0000,0xb42d0000) > PSOldGen total 1092288K, used 1060042K [0x502d0000, > 0x92d80000, 0x92d80000) > object space 1092288K, 97% used [0x502d0000,0x90e02a20,0x92d80000) > PSPermGen total 37376K, used 37186K [0x4c2d0000, 0x4e750000, 0x502d0000) > object space 37376K, 99% used [0x4c2d0000,0x4e720b60,0x4e750000) > > Nov 9, 2010 9:29:48 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop > executeAcceptLoop > WARNING: RMI TCP Accept-0: accept loop for > ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=36857] throws > java.lang.OutOfMemoryError: GC overhead limit exceeded > at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:116) > at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:34) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.tcp.TCPTransport$1.newThread(TCPTransport.java:94) > at java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:672) > at java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:721) > at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657) > at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:384) > at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341) > at java.lang.Thread.run(Thread.java:619) > >>From jmap I can see that most of the unreachable objects are > LinkedHashMap.Entry objects that I know are created as part of many > requests. > > Darragh > > On Wed, Nov 10, 2010 at 5:03 PM, Jon Masamitsu wrote: >> I assume that the Java process does not recover on it's >> own and has to be killed? >> >> What are the column heading for your jstat output? >> >> Turn on GC logging (if you don't already have it on) and check to >> see if your perm gen is full. >> >> If your perm gen is not full but the Java heap appears to be >> full, then the garbage collector just thinks that all that data >> is live. You used jmap. Do you know what's filling up >> the heap? >> >> >> >> On 11/10/2010 7:47 AM, Darragh Curran wrote: >>> Hi, >>> >>> I hope someone can help me understand this better. >>> >>> I'm running java version "Java(TM) SE Runtime Environment (build >>> 1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)" >>> >>> It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m' >>> >>> Every few weeks a host becomes unavailable for requests. When I look >>> into it, the java process is using 100% CPU, seems to have no running >>> threads when I do multiple stack dumps, and based on jstat output >>> appears to spend all it's time doing full gc runs: >>> >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1107 5656.903 6945.644 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>> >>> I used jmap to dump the heap for analysis. When I analyse it I see >>> that 94% of objects are unreachable, but not yet collected. >>> >>>> From jstat it appears that gc is running roughly every 10 seconds and >>> lasting approx 10 seconds, but fails to free any memory. >>> >>> I'd really appreciate some advise on how to better understand my >>> problem and what to do to try and fix it. >>> >>> Thanks, >>> Darragh >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From darragh.curran at gmail.com Thu Nov 11 03:50:55 2010 From: darragh.curran at gmail.com (Darragh Curran) Date: Thu, 11 Nov 2010 11:50:55 +0000 Subject: Full GC run freeing no memory despite many unreachable objects In-Reply-To: <4CDADE8C.6000503@oracle.com> References: <4CDAD05E.7080602@oracle.com> <4CDADE8C.6000503@oracle.com> Message-ID: Thanks Ramki, I know what I really need to do is find how to consistently reproduce this, then look at ways to either fix a problem with our code, tune gc settings or (least likely) discover a bug in hotspot. Before I do that, I was wondering if anyone had insight into what would cause the expensive full gc runs consuming 100% CPU and lasting for ~10 seconds without freeing any memory, despite there being many unreachable objects (according to the heap dump). Is there a cap on the length of a full GC run? Perhaps it's taking all it's time to traverse the heap and gets cancelled before it does any collecting? Best regards, Darragh On Wed, Nov 10, 2010 at 6:03 PM, Y. S. Ramakrishna wrote: > > Have you tried say doubling the size of your heap (old gen and > perm gen) to see what happens to the problem? May be you are using a > heap that is much too small for your application's natural footprint on > the platform on which you are running it? If you believe the process > heap needs should not be so large, check what the heap contains -- > does a jmap -histo:live show many objects as live that you believe > should have been collected? Have you tried to determine, using a > tool such as jhat on these jmap heap dumps, to see if your application > has a real leak after all? > > If you have determined that you do not have a leak, can you share > a test case? (Also please first try the latest 6u23 JDK before > you try to design a test case.) > > best regards. > -- ramki > > > On 11/10/10 09:22, Darragh Curran wrote: >> >> Ooops, Here's the jstat output with headings: >> >> S0C ? ?S1C ? ?S0U ? ?S1U ? ? ?EC ? ? ? EU ? ? ? ?OC ? ? ? ? OU >> PC ? ? PU ? ?YGC ? ? YGCT ? ?FGC ? ?FGCT ? ? GCT >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9 >> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9 >> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9 >> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9 >> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9 >> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9 >> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9 >> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9 >> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0 >> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0 >> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0 >> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0 >> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0 >> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0 >> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0 >> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0 >> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0 >> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0 >> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598 >> ... >> ... >> ... >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1107 ?5656.903 6945.644 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1107 ?5656.903 6945.644 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >> >> It looks like the perm gen is full (~99%) >> >> The process doesn't recover on its own. Here's some other gc output >> from our logs: >> >> >> Heap >> ?PSYoungGen ? ? ?total 531712K, used 517887K [0x92d80000, 0xb42d0000, >> 0xb42d0000) >> ?eden space 517888K, 99% used [0x92d80000,0xb273fff8,0xb2740000) >> ?from space 13824K, 0% used [0xb2740000,0xb2740000,0xb34c0000) >> ?to ? space 14400K, 0% used [0xb34c0000,0xb34c0000,0xb42d0000) >> ?PSOldGen ? ? ? ?total 1092288K, used 1060042K [0x502d0000, >> 0x92d80000, 0x92d80000) >> ?object space 1092288K, 97% used [0x502d0000,0x90e02a20,0x92d80000) >> ?PSPermGen ? ? ? total 37376K, used 37186K [0x4c2d0000, 0x4e750000, >> 0x502d0000) >> ?object space 37376K, 99% used [0x4c2d0000,0x4e720b60,0x4e750000) >> >> Nov 9, 2010 9:29:48 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop >> executeAcceptLoop >> WARNING: RMI TCP Accept-0: accept loop for >> ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=36857] throws >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> ? ? ? ?at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:116) >> ? ? ? ?at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:34) >> ? ? ? ?at java.security.AccessController.doPrivileged(Native Method) >> ? ? ? ?at >> sun.rmi.transport.tcp.TCPTransport$1.newThread(TCPTransport.java:94) >> ? ? ? ?at >> java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:672) >> ? ? ? ?at >> java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:721) >> ? ? ? ?at >> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657) >> ? ? ? ?at >> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:384) >> ? ? ? ?at >> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341) >> ? ? ? ?at java.lang.Thread.run(Thread.java:619) >> >>> From jmap I can see that most of the unreachable objects are >> >> LinkedHashMap.Entry objects that I know are created as part of many >> requests. >> >> Darragh >> >> On Wed, Nov 10, 2010 at 5:03 PM, Jon Masamitsu >> wrote: >>> >>> ?I assume that the Java process does not recover on it's >>> own and has to be killed? >>> >>> What are the column heading for your jstat output? >>> >>> Turn on GC logging (if you don't already have it on) and check to >>> see if your perm gen is full. >>> >>> If your perm gen is not full but the Java heap appears to be >>> full, then the garbage collector just thinks that all that data >>> is live. ?You used jmap. ?Do you know what's filling up >>> the heap? >>> >>> >>> >>> On 11/10/2010 7:47 AM, Darragh Curran wrote: >>>> >>>> Hi, >>>> >>>> I hope someone can help me understand this better. >>>> >>>> I'm running java version "Java(TM) SE Runtime Environment (build >>>> 1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)" >>>> >>>> It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m' >>>> >>>> Every few weeks a host becomes unavailable for requests. When I look >>>> into it, the java process is using 100% CPU, seems to have no running >>>> threads when I do multiple stack dumps, and based on jstat output >>>> appears to spend all it's time doing full gc runs: >>>> >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1107 ?5656.903 6945.644 >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287 >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1 >>>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686 >>>> >>>> I used jmap to dump the heap for analysis. When I analyse it I see >>>> that 94% of objects are unreachable, but not yet collected. >>>> >>>>> From jstat it appears that gc is running roughly every 10 seconds and >>>> >>>> lasting approx 10 seconds, but fails to free any memory. >>>> >>>> I'd really appreciate some advise on how to better understand my >>>> problem and what to do to try and fix it. >>>> >>>> Thanks, >>>> Darragh >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From jon.masamitsu at oracle.com Thu Nov 11 06:48:09 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Thu, 11 Nov 2010 06:48:09 -0800 Subject: Full GC run freeing no memory despite many unreachable objects In-Reply-To: References: <4CDAD05E.7080602@oracle.com> <4CDADE8C.6000503@oracle.com> Message-ID: <4CDC0229.9040600@oracle.com> Darragh, The GC thinks that all the data in the heap is live so that is why no space is being recovered at a full GC. There is no time limit on the collections (for this collector). The full collection does complete. I don't know why the tools you've used say the objects are unreachable. My tendency is, of course, to believe the GC before the tools :-). Jon On 11/11/10 03:50, Darragh Curran wrote: > Thanks Ramki, > > I know what I really need to do is find how to consistently reproduce > this, then look at ways to either fix a problem with our code, tune gc > settings or (least likely) discover a bug in hotspot. > > Before I do that, I was wondering if anyone had insight into what > would cause the expensive full gc runs consuming 100% CPU and lasting > for ~10 seconds without freeing any memory, despite there being many > unreachable objects (according to the heap dump). > > Is there a cap on the length of a full GC run? Perhaps it's taking all > it's time to traverse the heap and gets cancelled before it does any > collecting? > > Best regards, > Darragh > > > On Wed, Nov 10, 2010 at 6:03 PM, Y. S. Ramakrishna > wrote: > >> Have you tried say doubling the size of your heap (old gen and >> perm gen) to see what happens to the problem? May be you are using a >> heap that is much too small for your application's natural footprint on >> the platform on which you are running it? If you believe the process >> heap needs should not be so large, check what the heap contains -- >> does a jmap -histo:live show many objects as live that you believe >> should have been collected? Have you tried to determine, using a >> tool such as jhat on these jmap heap dumps, to see if your application >> has a real leak after all? >> >> If you have determined that you do not have a leak, can you share >> a test case? (Also please first try the latest 6u23 JDK before >> you try to design a test case.) >> >> best regards. >> -- ramki >> >> >> On 11/10/10 09:22, Darragh Curran wrote: >> >>> Ooops, Here's the jstat output with headings: >>> >>> S0C S1C S0U S1U EC EU OC OU >>> PC PU YGC YGCT FGC FGCT GCT >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 >>> 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 >>> 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 >>> 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 >>> 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 >>> 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 >>> 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 >>> 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060013.9 >>> 37376.0 37182.0 38654 1288.741 1105 5636.779 6925.520 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 >>> 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 >>> 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 >>> 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 >>> 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 >>> 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 >>> 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 >>> 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 >>> 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 >>> 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.0 >>> 37376.0 37182.0 38654 1288.741 1106 5646.857 6935.598 >>> ... >>> ... >>> ... >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1107 5656.903 6945.644 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1107 5656.903 6945.644 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>> >>> It looks like the perm gen is full (~99%) >>> >>> The process doesn't recover on its own. Here's some other gc output >>> from our logs: >>> >>> >>> Heap >>> PSYoungGen total 531712K, used 517887K [0x92d80000, 0xb42d0000, >>> 0xb42d0000) >>> eden space 517888K, 99% used [0x92d80000,0xb273fff8,0xb2740000) >>> from space 13824K, 0% used [0xb2740000,0xb2740000,0xb34c0000) >>> to space 14400K, 0% used [0xb34c0000,0xb34c0000,0xb42d0000) >>> PSOldGen total 1092288K, used 1060042K [0x502d0000, >>> 0x92d80000, 0x92d80000) >>> object space 1092288K, 97% used [0x502d0000,0x90e02a20,0x92d80000) >>> PSPermGen total 37376K, used 37186K [0x4c2d0000, 0x4e750000, >>> 0x502d0000) >>> object space 37376K, 99% used [0x4c2d0000,0x4e720b60,0x4e750000) >>> >>> Nov 9, 2010 9:29:48 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop >>> executeAcceptLoop >>> WARNING: RMI TCP Accept-0: accept loop for >>> ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=36857] throws >>> java.lang.OutOfMemoryError: GC overhead limit exceeded >>> at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:116) >>> at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:34) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at >>> sun.rmi.transport.tcp.TCPTransport$1.newThread(TCPTransport.java:94) >>> at >>> java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:672) >>> at >>> java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:721) >>> at >>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657) >>> at >>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:384) >>> at >>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341) >>> at java.lang.Thread.run(Thread.java:619) >>> >>> >>>> From jmap I can see that most of the unreachable objects are >>>> >>> LinkedHashMap.Entry objects that I know are created as part of many >>> requests. >>> >>> Darragh >>> >>> On Wed, Nov 10, 2010 at 5:03 PM, Jon Masamitsu >>> wrote: >>> >>>> I assume that the Java process does not recover on it's >>>> own and has to be killed? >>>> >>>> What are the column heading for your jstat output? >>>> >>>> Turn on GC logging (if you don't already have it on) and check to >>>> see if your perm gen is full. >>>> >>>> If your perm gen is not full but the Java heap appears to be >>>> full, then the garbage collector just thinks that all that data >>>> is live. You used jmap. Do you know what's filling up >>>> the heap? >>>> >>>> >>>> >>>> On 11/10/2010 7:47 AM, Darragh Curran wrote: >>>> >>>>> Hi, >>>>> >>>>> I hope someone can help me understand this better. >>>>> >>>>> I'm running java version "Java(TM) SE Runtime Environment (build >>>>> 1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)" >>>>> >>>>> It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m' >>>>> >>>>> Every few weeks a host becomes unavailable for requests. When I look >>>>> into it, the java process is using 100% CPU, seems to have no running >>>>> threads when I do multiple stack dumps, and based on jstat output >>>>> appears to spend all it's time doing full gc runs: >>>>> >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1107 5656.903 6945.644 >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1108 5666.547 6955.287 >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>>>> 14400.0 13824.0 0.0 0.0 517888.0 517888.0 1092288.0 1060014.1 >>>>> 37376.0 37182.0 38654 1288.741 1109 5675.945 6964.686 >>>>> >>>>> I used jmap to dump the heap for analysis. When I analyse it I see >>>>> that 94% of objects are unreachable, but not yet collected. >>>>> >>>>> >>>>>> From jstat it appears that gc is running roughly every 10 seconds and >>>>>> >>>>> lasting approx 10 seconds, but fails to free any memory. >>>>> >>>>> I'd really appreciate some advise on how to better understand my >>>>> problem and what to do to try and fix it. >>>>> >>>>> Thanks, >>>>> Darragh >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing list >>>>> hotspot-gc-use at openjdk.java.net >>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>> >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >>>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> From shane.cox at gmail.com Fri Nov 12 11:31:53 2010 From: shane.cox at gmail.com (Shane Cox) Date: Fri, 12 Nov 2010 14:31:53 -0500 Subject: Setting Min and Max Heap Size to the same value affects size of Young Gen Message-ID: This is probably well known behavior, but why does setting the Min and Max Heap Size to the same value affect the default size of the Young Generation? For example: Scenario 1: -d64 -Xms1536m -Xmx4096m -XX:+UseConcMarkSweepGC Young Generation is small: 18,624K {Heap before GC invocations=0 (full 0): par new generation total 18624K, used 16000K [0xfffffd7ef4e00000, 0xfffffd7ef62c0000, 0xfffffd7f05c60000) eden space 16000K, 100% used [0xfffffd7ef4e00000, 0xfffffd7ef5da0000, 0xfffffd7ef5da0000) from space 2624K, 0% used [0xfffffd7ef5da0000, 0xfffffd7ef5da0000, 0xfffffd7ef6030000) to space 2624K, 0% used [0xfffffd7ef6030000, 0xfffffd7ef6030000, 0xfffffd7ef62c0000) concurrent mark-sweep generation total 1551616K, used 0K [0xfffffd7f05c60000, 0xfffffd7f647a0000, 0xfffffd7ff4e00000) concurrent-mark-sweep perm gen total 21248K, used 7265K [0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000) 2010-11-12T14:00:16.083-0500: 0.364: [GC 0.364: [ParNew: 16000K->2150K(18624K), 0.0048839 secs] 16000K->2150K(1570240K), 0.0049538 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] Scenario 2: -d64 -Xms1536m -Xmx1536m -XX:+UseConcMarkSweepGC Young Generation is much larger: 172,032K {Heap before GC invocations=0 (full 0): par new generation total 172032K, used 147456K [0xfffffd7f94e00000, 0xfffffd7fa0e00000, 0xfffffd7fa0e00000) eden space 147456K, 100% used [0xfffffd7f94e00000, 0xfffffd7f9de00000, 0xfffffd7f9de00000) from space 24576K, 0% used [0xfffffd7f9de00000, 0xfffffd7f9de00000, 0xfffffd7f9f600000) to space 24576K, 0% used [0xfffffd7f9f600000, 0xfffffd7f9f600000, 0xfffffd7fa0e00000) concurrent mark-sweep generation total 1376256K, used 0K [0xfffffd7fa0e00000, 0xfffffd7ff4e00000, 0xfffffd7ff4e00000) concurrent-mark-sweep perm gen total 21248K, used 12639K [0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000) 2010-11-12T11:53:01.376-0500: 360.088: [GC 360.088: [ParNew: 147456K->7373K(172032K), 0.0093910 secs] 147456K->7373K(1548288K), 0.0094709 secs] [Times: user=0.03 sys=0.02, real=0.01 secs] jinfo reports the value of -XX:CMSYoungGenPerWorker=16777216 in both scenarios, as well as -XX:ParallelGCThreads=13. So it's unclear to me why the Young Gen would be so small when -Xms and -Xmx are different values (Scenario 1). java -version java version "1.6.0_14" Java(TM) SE Runtime Environment (build 1.6.0_14-b08) Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode) Any insights would be appreciated. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101112/e59a9f83/attachment.html From y.s.ramakrishna at oracle.com Fri Nov 12 13:24:46 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Fri, 12 Nov 2010 13:24:46 -0800 Subject: Setting Min and Max Heap Size to the same value affects size of Young Gen In-Reply-To: References: Message-ID: <4CDDB09E.1000302@oracle.com> Sounds like a bug to me. I'll check the latest JDK when i get the chance. -- ramki On 11/12/2010 11:31 AM, Shane Cox wrote: > This is probably well known behavior, but why does setting the Min and Max > Heap Size to the same value affect the default size of the Young > Generation? For example: > > Scenario 1: > -d64 -Xms1536m -Xmx4096m -XX:+UseConcMarkSweepGC > > Young Generation is small: 18,624K > > {Heap before GC invocations=0 (full 0): > par new generation total 18624K, used 16000K [0xfffffd7ef4e00000, > 0xfffffd7ef62c0000, 0xfffffd7f05c60000) > eden space 16000K, 100% used [0xfffffd7ef4e00000, 0xfffffd7ef5da0000, > 0xfffffd7ef5da0000) > from space 2624K, 0% used [0xfffffd7ef5da0000, 0xfffffd7ef5da0000, > 0xfffffd7ef6030000) > to space 2624K, 0% used [0xfffffd7ef6030000, 0xfffffd7ef6030000, > 0xfffffd7ef62c0000) > concurrent mark-sweep generation total 1551616K, used 0K > [0xfffffd7f05c60000, 0xfffffd7f647a0000, 0xfffffd7ff4e00000) > concurrent-mark-sweep perm gen total 21248K, used 7265K > [0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000) > 2010-11-12T14:00:16.083-0500: 0.364: [GC 0.364: [ParNew: > 16000K->2150K(18624K), 0.0048839 secs] 16000K->2150K(1570240K), 0.0049538 > secs] [Times: user=0.02 sys=0.01, real=0.01 secs] > > > > Scenario 2: > -d64 -Xms1536m -Xmx1536m -XX:+UseConcMarkSweepGC > > Young Generation is much larger: 172,032K > > {Heap before GC invocations=0 (full 0): > par new generation total 172032K, used 147456K [0xfffffd7f94e00000, > 0xfffffd7fa0e00000, 0xfffffd7fa0e00000) > eden space 147456K, 100% used [0xfffffd7f94e00000, 0xfffffd7f9de00000, > 0xfffffd7f9de00000) > from space 24576K, 0% used [0xfffffd7f9de00000, 0xfffffd7f9de00000, > 0xfffffd7f9f600000) > to space 24576K, 0% used [0xfffffd7f9f600000, 0xfffffd7f9f600000, > 0xfffffd7fa0e00000) > concurrent mark-sweep generation total 1376256K, used 0K > [0xfffffd7fa0e00000, 0xfffffd7ff4e00000, 0xfffffd7ff4e00000) > concurrent-mark-sweep perm gen total 21248K, used 12639K > [0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000) > 2010-11-12T11:53:01.376-0500: 360.088: [GC 360.088: [ParNew: > 147456K->7373K(172032K), 0.0093910 secs] 147456K->7373K(1548288K), 0.0094709 > secs] [Times: user=0.03 sys=0.02, real=0.01 secs] > > > > jinfo reports the value of -XX:CMSYoungGenPerWorker=16777216 in both > scenarios, as well as -XX:ParallelGCThreads=13. So it's unclear to me why > the Young Gen would be so small when -Xms and -Xmx are different values > (Scenario 1). > > > java -version > java version "1.6.0_14" > Java(TM) SE Runtime Environment (build 1.6.0_14-b08) > Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode) > > > Any insights would be appreciated. > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From y.s.ramakrishna at oracle.com Mon Nov 15 10:22:20 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Mon, 15 Nov 2010 10:22:20 -0800 Subject: Setting Min and Max Heap Size to the same value affects size of Young Gen In-Reply-To: <4CDDB09E.1000302@oracle.com> References: <4CDDB09E.1000302@oracle.com> Message-ID: <4CE17A5C.3080202@oracle.com> Hi Shane -- Yes, the bug exists also in the latest JDK 7 builds. I am looking into it and filed the following bug to track it:- 7000125 CMS: Anti-monotone young gen sizing with respect to maximum whole heap size specification -- ramki On 11/12/10 13:24, Y. Srinivas Ramakrishna wrote: > Sounds like a bug to me. I'll check the latest JDK when i get the chance. > > -- ramki > > On 11/12/2010 11:31 AM, Shane Cox wrote: >> This is probably well known behavior, but why does setting the Min and Max >> Heap Size to the same value affect the default size of the Young >> Generation? For example: >> >> Scenario 1: >> -d64 -Xms1536m -Xmx4096m -XX:+UseConcMarkSweepGC >> >> Young Generation is small: 18,624K >> >> {Heap before GC invocations=0 (full 0): >> par new generation total 18624K, used 16000K [0xfffffd7ef4e00000, >> 0xfffffd7ef62c0000, 0xfffffd7f05c60000) >> eden space 16000K, 100% used [0xfffffd7ef4e00000, 0xfffffd7ef5da0000, >> 0xfffffd7ef5da0000) >> from space 2624K, 0% used [0xfffffd7ef5da0000, 0xfffffd7ef5da0000, >> 0xfffffd7ef6030000) >> to space 2624K, 0% used [0xfffffd7ef6030000, 0xfffffd7ef6030000, >> 0xfffffd7ef62c0000) >> concurrent mark-sweep generation total 1551616K, used 0K >> [0xfffffd7f05c60000, 0xfffffd7f647a0000, 0xfffffd7ff4e00000) >> concurrent-mark-sweep perm gen total 21248K, used 7265K >> [0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000) >> 2010-11-12T14:00:16.083-0500: 0.364: [GC 0.364: [ParNew: >> 16000K->2150K(18624K), 0.0048839 secs] 16000K->2150K(1570240K), 0.0049538 >> secs] [Times: user=0.02 sys=0.01, real=0.01 secs] >> >> >> >> Scenario 2: >> -d64 -Xms1536m -Xmx1536m -XX:+UseConcMarkSweepGC >> >> Young Generation is much larger: 172,032K >> >> {Heap before GC invocations=0 (full 0): >> par new generation total 172032K, used 147456K [0xfffffd7f94e00000, >> 0xfffffd7fa0e00000, 0xfffffd7fa0e00000) >> eden space 147456K, 100% used [0xfffffd7f94e00000, 0xfffffd7f9de00000, >> 0xfffffd7f9de00000) >> from space 24576K, 0% used [0xfffffd7f9de00000, 0xfffffd7f9de00000, >> 0xfffffd7f9f600000) >> to space 24576K, 0% used [0xfffffd7f9f600000, 0xfffffd7f9f600000, >> 0xfffffd7fa0e00000) >> concurrent mark-sweep generation total 1376256K, used 0K >> [0xfffffd7fa0e00000, 0xfffffd7ff4e00000, 0xfffffd7ff4e00000) >> concurrent-mark-sweep perm gen total 21248K, used 12639K >> [0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000) >> 2010-11-12T11:53:01.376-0500: 360.088: [GC 360.088: [ParNew: >> 147456K->7373K(172032K), 0.0093910 secs] 147456K->7373K(1548288K), 0.0094709 >> secs] [Times: user=0.03 sys=0.02, real=0.01 secs] >> >> >> >> jinfo reports the value of -XX:CMSYoungGenPerWorker=16777216 in both >> scenarios, as well as -XX:ParallelGCThreads=13. So it's unclear to me why >> the Young Gen would be so small when -Xms and -Xmx are different values >> (Scenario 1). >> >> >> java -version >> java version "1.6.0_14" >> Java(TM) SE Runtime Environment (build 1.6.0_14-b08) >> Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode) >> >> >> Any insights would be appreciated. >> >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From dorirabin at gmail.com Mon Nov 8 09:27:13 2010 From: dorirabin at gmail.com (Dori Rabin) Date: Mon, 8 Nov 2010 19:27:13 +0200 Subject: problem in gc with incremental mode In-Reply-To: <4CD82F5C.20806@oracle.com> References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local> <4CD2E49B.1080008@oracle.com> <4CD776C2.3070601@oracle.com> <4CD82F5C.20806@oracle.com> Message-ID: Well thank you both I will try these recommendations hoping for the best Thanks Dori On Mon, Nov 8, 2010 at 7:11 PM, Y. S. Ramakrishna < y.s.ramakrishna at oracle.com> wrote: > I agree completely with Jon's recommendation. As regards, your question:- > > > I didn't quiet understand what exactly you meant by "scale back the # >>>> parallel threads used by the CMS..." is there a parameter I need to set for >>>> that ? if yes what should be its value ? >>>> >>> > On a 4-processor machine, you get a single concurrent marking thread, so > you do not need to do anything specific since you are already at the > minimum > # concurrent worker threads used by CMS. > > -- ramki > > > On 11/07/10 20:04, Jon Masamitsu wrote: > >> The incremental mode of CMS was implemented for running CMS on machines >> with a single hardware thread. The idea was that CMS does work >> concurrently >> with the application but with a single hardware thread, when CMS ran in >> a concurrent phase, it would still be stopping the application (CMS would >> use the single hardware thread and there would be nothing for the >> application >> to run on). So the incremental mode of CMS does the concurrent phases >> in increments and gives up the hardware thread to the application on >> a regular basis. On a 4 processor machine I would not recommend the >> incremental >> mode. If you have not tried the regular CMS (no incremental mode), >> I recommend that you try it. Overall it is more efficient. >> >> On 11/6/10 11:32 PM, Dori Rabin wrote: >> >>> Thanks fo your replies... >>> I am running on jdk1.6.0_04 and OS is Linux 2.4.21-37.ELsmp, I have 4 >>> processors on the machine >>> About sharing the log file it might be a little problematic now so let's >>> discuss it if no other option I also thought of getting rid of the >>> incremental mode but I am afraid of the effect of long pauses on our >>> realtime application due to the stop the world phase of CMS.... >>> I didn't quiet understand what exactly you meant by "scale back the # >>> parallel threads used by the CMS..." is there a parameter I need to set for >>> that ? if yes what should be its value ? >>> Thanks >>> Dori >>> On Thu, Nov 4, 2010 at 6:51 PM, Y. S. Ramakrishna < >>> y.s.ramakrishna at oracle.com > wrote: >>> >>> Hi Dori -- >>> >>> What's the version of JDK you are running? Can you share a >>> complete log? >>> It appears as though the iCMS "auto-pacing" is, for some reason, not >>> kicking in "soon enough"; one workaround is to use turn off >>> auto-pacing >>> and use an explicit duty-cycle (which has its own disadvantages). >>> >>> I'd suggest filing a bug, and including a complete log file showing >>> the problem. >>> >>> thanks. >>> -- ramki >>> >>> On 11/04/10 06:27, Rabin Dori wrote: >>> > Hi, >>> > >>> > Once in a while and for a reason I cannot understand the CMS >>> kicks up >>> > too late which causes a promotion failure and full GC that takes >>> very >>> > long (more than 2 minutes which causes other problems)? >>> > >>> > My question is how to tune the gc flags in order to make sure >>> that the >>> > concurrent sweep will always occur in parallel (incremental mode) >>> > without long pause stop the world but also without reaching its >>> maximum >>> > capacity ? >>> > >>> > >>> > >>> > (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is >>> > ignored because of the CMSIncrementalMode >>> > >>> > And from looking in the log file we can see that the old generation >>> > reaches size of 835?959K out of 843?000K at the time the concurrent >>> > failure (I marked this line in red) >>> > >>> > >>> > >>> > *_I am running the jvm with the following parameters :_* >>> > >>> > wrapper.java.additional.4=-XX:NewSize=200m >>> > >>> > wrapper.java.additional.5=-XX:SurvivorRatio=6 >>> > >>> > wrapper.java.additional.6=-XX:MaxTenuringThreshold=4 >>> > >>> > wrapper.java.additional.7=-XX:+CMSIncrementalMode >>> > >>> > wrapper.java.additional.8=-XX:+CMSIncrementalPacing >>> > >>> > wrapper.java.additional.9=-XX:+DisableExplicitGC >>> > >>> > wrapper.java.additional.10=-XX:+UseConcMarkSweepGC >>> > >>> > wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled >>> > >>> > wrapper.java.additional.12=-XX:+PrintGCDetails >>> > >>> > wrapper.java.additional.13=-XX:+PrintGCTimeStamps >>> > >>> > wrapper.java.additional.14=-XX:-TraceClassUnloading >>> > >>> > wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError >>> > >>> > wrapper.java.additional.16=-verbose:gc >>> > >>> > wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60 >>> > >>> > wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly >>> > >>> > wrapper.java.additional.19=-XX:+PrintTenuringDistribution >>> > >>> > >>> > >>> > >>> > >>> > *_Extracts from the log file:_* >>> > >>> > INFO | jvm 1 | 2010/11/02 04:54:33 | Desired survivor size >>> 13107200 >>> > bytes, new threshold 4 (max 4) >>> > >>> > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 1: 544000 >>> > bytes, 544000 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 2: 346320 >>> > bytes, 890320 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 3: 262800 >>> > bytes, 1153120 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:54:33 | - age 4: 238528 >>> > bytes, 1391648 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), >>> > 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 >>> secs] >>> > [Times: user=0.00 sys=0.00, real=0.11 secs] >>> > >>> > INFO | jvm 1 | 2010/11/02 04:55:54 | 422097.583: [GC >>> 422097.583: >>> > [ParNew >>> > >>> > INFO | jvm 1 | 2010/11/02 04:55:54 | >>> > >>> > INFO | jvm 1 | 2010/11/02 04:55:54 | Desired survivor size >>> 13107200 >>> > bytes, new threshold 4 (max 4) >>> > >>> > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 1: 577104 >>> > bytes, 577104 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 2: 261856 >>> > bytes, 838960 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 3: 298832 >>> > bytes, 1137792 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:55:54 | - age 4: 259176 >>> > bytes, 1396968 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), >>> > 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 >>> secs] >>> > [Times: user=0.00 sys=0.00, real=0.05 secs] >>> > >>> > INFO | jvm 1 | 2010/11/02 04:57:27 | 422190.993: [GC >>> 422190.993: >>> > [ParNew >>> > >>> > INFO | jvm 1 | 2010/11/02 04:57:28 | >>> > >>> > INFO | jvm 1 | 2010/11/02 04:57:28 | Desired survivor size >>> 13107200 >>> > bytes, new threshold 4 (max 4) >>> > >>> > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 1: 676656 >>> > bytes, 676656 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 2: 283376 >>> > bytes, 960032 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 3: 239472 >>> > bytes, 1199504 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:57:28 | - age 4: 264960 >>> > bytes, 1464464 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), >>> > 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 >>> secs] >>> > [Times: user=0.01 sys=0.00, real=0.07 secs] >>> > >>> > INFO | jvm 1 | 2010/11/02 04:58:54 | 422277.406: [GC >>> 422277.406: >>> > [ParNew >>> > >>> > INFO | jvm 1 | 2010/11/02 04:58:54 | Desired survivor size >>> 13107200 >>> > bytes, new threshold 4 (max 4) >>> > >>> > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 1: 615944 >>> > bytes, 615944 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 2: 334120 >>> > bytes, 950064 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 3: 276736 >>> > bytes, 1226800 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:58:54 | - age 4: 236424 >>> > bytes, 1463224 total >>> > >>> > INFO | jvm 1 | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), >>> > 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 >>> secs] >>> > [Times: user=0.00 sys=0.00, real=0.04 secs] >>> > >>> > INFO | jvm 1 | 2010/11/02 05:00:23 | 422366.439: [GC >>> 422366.439: >>> > [ParNew >>> > >>> > INFO | jvm 1 | 2010/11/02 05:00:23 | (promotion failed) >>> > >>> > INFO | jvm 1 | 2010/11/02 05:00:23 | Desired survivor size >>> 13107200 >>> > bytes, new threshold 4 (max 4) >>> > >>> > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 1: 574000 >>> > bytes, 574000 total >>> > >>> > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 2: 315432 >>> > bytes, 889432 total >>> > >>> > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 3: 281216 >>> > bytes, 1170648 total >>> > >>> > INFO | jvm 1 | 2010/11/02 05:00:23 | - age 4: 271776 >>> > bytes, 1442424 total >>> > >>> > INFO | jvm 1 | 2010/11/02 05:00:23 | : >>> 155528K->155689K(179200K), >>> > 0.1007840 secs]422366.540: [CMS >>> > >>> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >>> > sun.reflect.GeneratedMethodAccessor121] >>> > >>> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >>> > sun.reflect.GeneratedMethodAccessor119] >>> > >>> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >>> > sun.reflect.GeneratedMethodAccessor124] >>> > >>> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >>> > sun.reflect.GeneratedMethodAccessor123] >>> > >>> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >>> > sun.reflect.GeneratedMethodAccessor120] >>> > >>> > INFO | jvm 1 | 2010/11/02 05:01:46 | [Unloading class >>> > sun.reflect.GeneratedMethodAccessor122] >>> > >>> > ERROR | wrapper | 2010/11/02 05:02:37 | JVM appears hung: >>> Timed out >>> > waiting for signal from JVM. >>> > >>> > >>> > >>> > >>> > >>> > *Dori Rabin* >>> > >>> > *cid:image001.gif at 01CB69E7.E5E45760* >>> > >>> > >>> > >>> > *cid:image002.jpg at 01CB69E7.E5E45760* >>> > >>> > T. +972-3-123-4567 F. +972-3- 766-3559 M. +972-54- 4232-706 >>> > >>> > Email: mailto:Dori >> >>> >.Rabin at starhome.com >>> >>> >>> > >>> > >>> > >>> > >>> > >>> > *cid:image003.gif at 01CB69E7.E5E45760* >>> > *cid:image004.gif at 01CB69E7.E5E45760* >>> > *cid:image005.gif at 01CB69E7.E5E45760* >>> > *cid:image006.gif at 01CB69E7.E5E45760* >>> > *cid:image007.gif at 01CB69E7.E5E45760* >>> > >>> > >>> > This email contains proprietary and/or confidential information of >>> > Starhome. If you >>> > >>> > have received this email in error, please delete all copies without >>> > delay and do not >>> > >>> > copy, distribute, or rely on any information contained in this >>> email. >>> > Thank you! >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >>> ------------------------------------------------------------------------ >>> > >>> > _______________________________________________ >>> > hotspot-gc-use mailing list >>> > hotspot-gc-use at openjdk.java.net >>> >>> >>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> >>> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101108/c66c2d10/attachment-0001.html From brian.williams at mayalane.com Mon Nov 15 11:29:26 2010 From: brian.williams at mayalane.com (Brian Williams) Date: Mon, 15 Nov 2010 14:29:26 -0500 Subject: CMS Promotion Failures Message-ID: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> Greetings, We'd like some pointers on how to tune to avoid (or more realistically delay as long as possible) promotion failures with CMS. Our server maintains an in memory database cache, that on appropriate hardware could take up over 100GB of RAM. Through what we've been able to find online, and lots of experimentation, we've made a lot of progress in tuning GC to work well for us. We have the same problem that others with similar access patterns have--no matter what, we eventually seem to hit a promotion failure, which triggers a STW serial collection. Here are the general principles that we've arrived at to delay the promotion failure: 1. Limit how much data is promoted to just what is actually old garbage. This can be done by having a large new size, survivor size, and tenuring threshold. 2. Use as large of heap as possible regardless of the size of the database cache that's needed. 3. If possible, fully preload the database cache into memory at startup, and then perform a System.gc() to fully compact the old generation. This will start things off with as little fragmentation as possible. A few questions 1. Is it better to have CMSInitiatingOccupancyFraction set closer to the amount of live data in the server so that CMS runs more frequently or to set this value as high as possible without running into a concurrent mode failure? 2. Would running with -XX:+AlwaysPreTouch make any difference? 3. We've seen mentioned on this list that there are additional things that can be done to tune against promotion failures, e.g. "As regards fragmentation, it can be tricky to tune against, but we can try once we understand a bit more about the object sizes and demographics." But we haven't seen any pointers for how to go about this. Can you point us in the right direction? 4. Would changing any of the PLAB/TLAB settings make a difference? 5. What are the main factors that affect the duration of a promotion failure? Is it the amount of live data in bytes, the number of live objects, the total size of the heap, etc? 6. Are there any other JVM settings that we should try, other advice? By the way, we have given G1 a try, but we're still getting full GCs pretty frequently. Sorry for all of the questions. We definitely appreciate any help you can offer. Brian From y.s.ramakrishna at oracle.com Mon Nov 15 14:16:05 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Mon, 15 Nov 2010 14:16:05 -0800 Subject: CMS Promotion Failures In-Reply-To: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> Message-ID: <4CE1B125.2020302@oracle.com> On 11/15/10 11:29, Brian Williams wrote: > Greetings, > We'd like some pointers on how to tune to avoid (or more realistically delay as long as possible) promotion failures with CMS. Our server maintains an in memory database cache, that on appropriate hardware could take up over 100GB of RAM. > > Through what we've been able to find online, and lots of experimentation, we've made a lot of progress in tuning GC to work well for us. We have the same problem that others with similar access patterns have--no matter what, we eventually seem to hit a promotion failure, which triggers a STW serial collection. > > Here are the general principles that we've arrived at to delay the promotion failure: > > 1. Limit how much data is promoted to just what is actually old garbage. This can be done by having a large new size, survivor size, and tenuring threshold. > > 2. Use as large of heap as possible regardless of the size of the database cache that's needed. > > 3. If possible, fully preload the database cache into memory at startup, and then perform a System.gc() to fully compact the old generation. This will start things off with as little fragmentation as possible. > > A few questions > > 1. Is it better to have CMSInitiatingOccupancyFraction set closer to the amount of live data in the server so that CMS runs more frequently or to set this value as high as possible without running into a concurrent mode failure? > Somewhere in between. My experience has been that you want yr CMS cycles to be neither too frequent, nor too infrequent. > 2. Would running with -XX:+AlwaysPreTouch make any difference? Only initially, until all of the old gen pages get objects promoted into them. On Solaris at least there is sometimes a cost from first touch, expecially if using very large pages. The pre-touch moves that cost out of the scavenges to the start-up phase. > > 3. We've seen mentioned on this list that there are additional things that can be done to tune against promotion failures, e.g. "As regards fragmentation, it can be tricky to tune against, but we can try once we understand a bit more about the object sizes and demographics." But we haven't seen any pointers for how to go about this. Can you point us in the right direction? > The basic idea is as you say in (1), promote only medium- and long-lived data. In other words, never promote any short-lived data, even under sudden load spikes. > 4. Would changing any of the PLAB/TLAB settings make a difference? These are autonomically sized and it's unlikely that a static setting will outperform the adaption, epsecially if you do not have steady loads. > > 5. What are the main factors that affect the duration of a promotion failure? Is it the amount of live data in bytes, the number of live objects, the total size of the heap, etc? > Yes. :-) (More seriously the cost is proportional to the amount copied, i.e. live data, and the size of the heap, i.e. also the dead data; the overhead is also slightly higher if you have many small as opposed to a few large objects.) > 6. Are there any other JVM settings that we should try, other advice? Controlling promotion rate and avoiding premature promotion of short-lived data is the most important piece of advice. > > By the way, we have given G1 a try, but we're still getting full GCs pretty frequently. Try giving G1 a bit more heap, and instead of constraining generation sizes to what worked best for CMS, just specify a pause-time (start higher and slowly iterate lower) and let G1's autonomics find an optimal partitioning of the heap. There are probably a few not yet known sharp corners of G1 that if you bring to our attention we can try and fix. One current disadvantage of G1 which is planned to be fixed soon, is that we do not deal with Reference onjects during scavenges, so this can place G1 at a great disadvantage in terms of carrying a lot more garbage, if your application happens to use Reference objects (perhaps under the covers by the JDK libraries that you are using). Look at the GC tuning talk by Charlie Hunt and Tony Printezis in this year's JavaOne for some good advice on GC tuning in general and CMS tuning in particular. Hopefully they will also include G1 tuning into such a talk next year :-) best. -- ramki > > Sorry for all of the questions. We definitely appreciate any help you can offer. > > Brian > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From shaun.hennessy at alcatel-lucent.com Mon Nov 15 17:43:17 2010 From: shaun.hennessy at alcatel-lucent.com (Shaun Hennessy) Date: Mon, 15 Nov 2010 20:43:17 -0500 Subject: CMS Promotion Failures In-Reply-To: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> Message-ID: <4CE1E1B5.3080302@alcatel-lucent.com> Brian Williams wrote: > Greetings, > We'd like some pointers on how to tune to avoid (or more realistically delay as long as possible) promotion failures with CMS. Our server maintains an in memory database cache, that on appropriate hardware could take up over 100GB of RAM. > > Through what we've been able to find online, and lots of experimentation, we've made a lot of progress in tuning GC to work well for us. We have the same problem that others with similar access patterns have--no matter what, we eventually seem to hit a promotion failure, which triggers a STW serial collection. > > Here are the general principles that we've arrived at to delay the promotion failure: > > 1. Limit how much data is promoted to just what is actually old garbage. This can be done by having a large new size, survivor size, and tenuring threshold. > > 2. Use as large of heap as possible regardless of the size of the database cache that's needed. > > 3. If possible, fully preload the database cache into memory at startup, and then perform a System.gc() to fully compact the old generation. This will start things off with as little fragmentation as possible. > > A few questions > If the goal is just avoiding/delaying promotion failures the above all sound like good ideas to achieve your goal -- if any negatives they cause aren't a problem. As for the below I would set CMSInitiatingOccupancyFraction to lower value (ie 75, or 65, etc..) - if you set the value low enough, triggering CMS collection sooner -- could you not avoid promotion failures? I assume the point is to avoid the STW time spent by the promotion failure --- and manually triggering periodic System.gc()'s while app is running normally would be no better than the promotion failures in the first place? ( Throughput collector is not an option?) > 1. Is it better to have CMSInitiatingOccupancyFraction set closer to the amount of live data in the server so that CMS runs more frequently or to set this value as high as possible without running into a concurrent mode failure? > > 2. Would running with -XX:+AlwaysPreTouch make any difference? > > 3. We've seen mentioned on this list that there are additional things that can be done to tune against promotion failures, e.g. "As regards fragmentation, it can be tricky to tune against, but we can try once we understand a bit more about the object sizes and demographics." But we haven't seen any pointers for how to go about this. Can you point us in the right direction? > > 4. Would changing any of the PLAB/TLAB settings make a difference? > > 5. What are the main factors that affect the duration of a promotion failure? Is it the amount of live data in bytes, the number of live objects, the total size of the heap, etc? > > 6. Are there any other JVM settings that we should try, other advice? > > By the way, we have given G1 a try, but we're still getting full GCs pretty frequently. > > Sorry for all of the questions. We definitely appreciate any help you can offer. > > Brian > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From jon.masamitsu at oracle.com Mon Nov 15 18:31:04 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Mon, 15 Nov 2010 18:31:04 -0800 Subject: CMS Promotion Failures In-Reply-To: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> Message-ID: <4CE1ECE8.9040804@oracle.com> Brian, Ramki and Shaun have addressed most of your questions I think. Just wanted to know what type of platform (how many hardware threads) you're using. Also what is CMS doing when the promotion failures are happening (concurrent marking, preclean cleaning or sweeping)? Jon On 11/15/2010 11:29 AM, Brian Williams wrote: > Greetings, > We'd like some pointers on how to tune to avoid (or more realistically delay as long as possible) promotion failures with CMS. Our server maintains an in memory database cache, that on appropriate hardware could take up over 100GB of RAM. > > Through what we've been able to find online, and lots of experimentation, we've made a lot of progress in tuning GC to work well for us. We have the same problem that others with similar access patterns have--no matter what, we eventually seem to hit a promotion failure, which triggers a STW serial collection. > > Here are the general principles that we've arrived at to delay the promotion failure: > > 1. Limit how much data is promoted to just what is actually old garbage. This can be done by having a large new size, survivor size, and tenuring threshold. > > 2. Use as large of heap as possible regardless of the size of the database cache that's needed. > > 3. If possible, fully preload the database cache into memory at startup, and then perform a System.gc() to fully compact the old generation. This will start things off with as little fragmentation as possible. > > A few questions > > 1. Is it better to have CMSInitiatingOccupancyFraction set closer to the amount of live data in the server so that CMS runs more frequently or to set this value as high as possible without running into a concurrent mode failure? > > 2. Would running with -XX:+AlwaysPreTouch make any difference? > > 3. We've seen mentioned on this list that there are additional things that can be done to tune against promotion failures, e.g. "As regards fragmentation, it can be tricky to tune against, but we can try once we understand a bit more about the object sizes and demographics." But we haven't seen any pointers for how to go about this. Can you point us in the right direction? > > 4. Would changing any of the PLAB/TLAB settings make a difference? > > 5. What are the main factors that affect the duration of a promotion failure? Is it the amount of live data in bytes, the number of live objects, the total size of the heap, etc? > > 6. Are there any other JVM settings that we should try, other advice? > > By the way, we have given G1 a try, but we're still getting full GCs pretty frequently. > > Sorry for all of the questions. We definitely appreciate any help you can offer. > > Brian > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From brian.williams at mayalane.com Mon Nov 15 16:36:37 2010 From: brian.williams at mayalane.com (Brian Williams) Date: Mon, 15 Nov 2010 19:36:37 -0500 Subject: CMS Promotion Failures In-Reply-To: <4CE1B125.2020302@oracle.com> References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> <4CE1B125.2020302@oracle.com> Message-ID: <0CAD633F-B197-4EB4-A57D-F9BF69237F9A@mayalane.com> Thanks Ramki. If you can entertain a few followup questions: 1. If there is anything that could explain, beyond application usage, getting the promotion failures closer and closer together. 2. And as a follow on question. If calling System.gc() leaves the heap in a better state than a promotion failure? (This will help us to answer whether we want to push for a server restart or a scheduled GC). 3. Would using fewer parallel GC threads help reduce the fragmentation by having fewer PLABs? Brian From y.s.ramakrishna at oracle.com Mon Nov 15 19:08:05 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Mon, 15 Nov 2010 19:08:05 -0800 Subject: CMS Promotion Failures In-Reply-To: <0CAD633F-B197-4EB4-A57D-F9BF69237F9A@mayalane.com> References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> <4CE1B125.2020302@oracle.com> <0CAD633F-B197-4EB4-A57D-F9BF69237F9A@mayalane.com> Message-ID: <4CE1F595.9080601@oracle.com> On 11/15/2010 4:36 PM, Brian Williams wrote: > > Thanks Ramki. If you can entertain a few followup questions: > > 1. If there is anything that could explain, beyond application usage, getting the promotion failures closer and closer together. I have not seen that behaviour before. The only cases where i can think of that occurring is if the heap occupancy is also montonically increasing so that the "free space" available keeps getting smaller. But I am grasping at straws here. > > 2. And as a follow on question. If calling System.gc() leaves the heap in a better state than a promotion failure? (This will help us to answer whether we want to push for a server restart or a scheduled GC). > If you are not using +ExplicitGCInvokesConcurrent, then both would leave the heap in an identical state, because they both cause a single-threaded (alas, still) compacting collection of the entire heap. So, yes, scheduling explicit gc's to compact down the heap at an opportune time would definitely be worthwhile, if possible. > 3. Would using fewer parallel GC threads help reduce the fragmentation by having fewer PLABs? Yes, this is usually the case. More PLAB's (in the form of cached free lists with the individual GC worker threads) does translate to potentially more fragmentation, although i have generally found that our autonomic per-block inventory control usually results in keeping such fragmentation in check (unless the threads are "far too many" and the free space "too little"). -- ramki > > Brian From brian.williams at mayalane.com Wed Nov 17 06:17:14 2010 From: brian.williams at mayalane.com (Brian Williams) Date: Wed, 17 Nov 2010 09:17:14 -0500 Subject: CMS Promotion Failures In-Reply-To: <4CE1F595.9080601@oracle.com> References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> <4CE1B125.2020302@oracle.com> <0CAD633F-B197-4EB4-A57D-F9BF69237F9A@mayalane.com> <4CE1F595.9080601@oracle.com> Message-ID: <7DA80581-3D38-478C-9621-32B561454B63@mayalane.com> On Nov 15, 2010, at 10:08 PM, Y. Srinivas Ramakrishna wrote: >> 1. If there is anything that could explain, beyond application usage, getting the promotion failures closer and closer together. > > I have not seen that behaviour before. The only cases where i can think of that occurring is if > the heap occupancy is also montonically increasing so that the "free space" available keeps > getting smaller. But I am grasping at straws here. Output from jstat seems to indicate that's not the case here. Unfortunately, we're seeing this on a production server that doesn't have GC logging enabled. We're in the process of trying to get it enabled so we can try to understand this better. > >> >> 2. And as a follow on question. If calling System.gc() leaves the heap in a better state than a promotion failure? (This will help us to answer whether we want to push for a server restart or a scheduled GC). >> > > If you are not using +ExplicitGCInvokesConcurrent, then both would leave the heap > in an identical state, because they both cause a single-threaded (alas, still) compacting > collection of the entire heap. So, yes, scheduling explicit gc's to compact down > the heap at an opportune time would definitely be worthwhile, if possible. > >> 3. Would using fewer parallel GC threads help reduce the fragmentation by having fewer PLABs? > > Yes, this is usually the case. More PLAB's (in the form of cached free lists with the > individual GC worker threads) does translate to potentially more fragmentation, although > i have generally found that our autonomic per-block inventory control usually results > in keeping such fragmentation in check (unless the threads are "far too many" and the > free space "too little"). We're running on a 32-way x4600 and aren't setting the ParallelGC threads explicitly, so we're probably ending up with 32. We will try to dial it down to see how that helps. Thanks, Brian From brian.williams at mayalane.com Wed Nov 17 06:32:51 2010 From: brian.williams at mayalane.com (Brian Williams) Date: Wed, 17 Nov 2010 09:32:51 -0500 Subject: CMS Promotion Failures In-Reply-To: <4CE1ECE8.9040804@oracle.com> References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> <4CE1ECE8.9040804@oracle.com> Message-ID: <1135CAF6-F263-4AA7-A70C-FF20735841D4@mayalane.com> Hi Jon, We just received GC detail from one machine, a 16-way x4600. The promotion failure occurs 10 hours into the process life. -- application startup --- 2010-11-15T22:34:47.439-0800: [GC [ParNew: 1780514K->209664K(1887488K), 1.1993542 secs] 1780514K->369453K(24956160K), 1.1996619 secs] [Times: user=5.17 sys=1.05, real=1.20 secs] 2010-11-15T22:35:51.297-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.5736733 secs] 2047277K->610880K(24956160K), 0.5739673 secs] [Times: user=2.65 sys=0.44, real=0.57 secs] 2010-11-15T22:37:13.893-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.5714056 secs] 2288704K->838848K(24956160K), 0.5716780 secs] [Times: user=3.32 sys=0.24, real=0.57 secs] 2010-11-15T22:38:28.112-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.6964940 secs] 2516672K->1117914K(24956160K), 0.6967518 secs] [Times: user=4.06 sys=0.31, real=0.70 secs] 2010-11-15T22:40:02.015-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.6151412 secs] 2795738K->1369924K(24956160K), 0.6154426 secs] [Times: user=3.50 sys=0.28, real=0.62 secs] ... 50 CMS cycles pass ... 2010-11-16T10:05:46.116-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.4683483 secs] 8156115K->6706120K(24956160K), 0.4686710 secs] [Times: user=4.09 sys=0.25, real=0.47 secs] 2010-11-16T10:06:15.535-0800: [GC [ParNew: 1887488K->201665K(1887488K), 0.3743882 secs] 8383944K->6904794K(24956160K), 0.3746896 secs] [Times: user=2.88 sys=0.32, real=0.37 secs] 2010-11-16T10:06:25.861-0800: [GC [ParNew: 1879489K->209664K(1887488K), 0.6735419 secs] 8582618K->7164756K(24956160K), 0.6738153 secs] [Times: user=5.83 sys=0.68, real=0.67 secs] 2010-11-16T10:06:26.537-0800: [GC [1 CMS-initial-mark: 6955092K(23068672K)] 7172883K(24956160K), 0.1812929 secs] [Times: user=0.18 sys=0.00, real=0.18 secs] 2010-11-16T10:06:28.457-0800: [CMS-concurrent-mark: 1.708/1.738 secs] [Times: user=8.80 sys=0.16, real=1.74 secs] 2010-11-16T10:06:28.768-0800: [CMS-concurrent-preclean: 0.279/0.311 secs] [Times: user=0.75 sys=0.05, real=0.31 secs] 2010-11-16T10:06:54.938-0800: [GC [ParNew: 1887488K->208834K(1887488K), 0.3598649 secs] 8842580K->7358032K(24956160K), 0.3601703 secs] [Times: user=3.07 sys=0.11, real=0.36 secs] 2010-11-16T10:06:57.753-0800: [CMS-concurrent-abortable-preclean: 28.096/28.986 secs] [Times: user=44.84 sys=2.12, real=28.99 secs] 2010-11-16T10:06:57.755-0800: [GC[YG occupancy: 1050375 K (1887488 K)]2010-11-16T10:06:57.755-0800: [GC [ParNew (promotion failed): 1050375K->1051204K(1887488K), 0.9133199 secs] 8199573K->8337584K(24956160K), 0.9136117 secs] [Times: user=3.90 sys=0.01, real=0.91 secs] [Rescan (parallel) , 0.7407982 secs][weak refs processing, 0.0022770 secs] [1 CMS-remark: 7286379K(23068672K)] 8337584K(24956160K), 1.6572438 secs] [Times: user=7.01 sys=0.06, real=1.66 secs] 2010-11-16T10:07:01.679-0800: [Full GC [CMS2010-11-16T10:07:42.463-0800: [CMS-concurrent-sweep: 43.044/43.050 secs] [Times: user=48.80 sys=0.72, real=43.05 secs] (concurrent mode failure): 6616900K->2119438K(23068672K), 61.3282450 secs] 8504388K->2119438K(24956160K), [CMS Perm : 49108K->48394K(82008K)], 61.3285709 secs] [Times: user=61.33 sys=0.01, real=61.33 secs] 2010-11-16T10:08:08.462-0800: [GC [ParNew: 1677824K->157105K(1887488K), 0.0797175 secs] 3797262K->2276544K(24956160K), 0.0800104 secs] [Times: user=1.00 sys=0.00, real=0.08 secs] 2010-11-16T10:08:36.240-0800: [GC [ParNew: 1834929K->136334K(1887488K), 0.1978673 secs] 3954368K->2386916K(24956160K), 0.1981614 secs] [Times: user=1.72 sys=0.04, real=0.20 secs] The average amount data promoted per ParNew is 187m and we are looking into why it is so large. If you have any insight into this particular promotion failure, we would appreciate it. Thanks, Brian On Nov 15, 2010, at 9:31 PM, Jon Masamitsu wrote: > Brian, > > Ramki and Shaun have addressed most of your questions I think. > Just wanted to know what type of platform (how many hardware > threads) you're using. Also what is CMS doing when the > promotion failures are happening (concurrent marking, > preclean cleaning or sweeping)? > > Jon > From y.s.ramakrishna at oracle.com Wed Nov 17 09:21:02 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Wed, 17 Nov 2010 09:21:02 -0800 Subject: CMS Promotion Failures In-Reply-To: <7DA80581-3D38-478C-9621-32B561454B63@mayalane.com> References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com> <4CE1B125.2020302@oracle.com> <0CAD633F-B197-4EB4-A57D-F9BF69237F9A@mayalane.com> <4CE1F595.9080601@oracle.com> <7DA80581-3D38-478C-9621-32B561454B63@mayalane.com> Message-ID: <4CE40EFE.6000304@oracle.com> On 11/17/10 06:17, Brian Williams wrote: > > On Nov 15, 2010, at 10:08 PM, Y. Srinivas Ramakrishna wrote: > >>> 1. If there is anything that could explain, beyond application usage, getting the promotion failures closer and closer together. >> I have not seen that behaviour before. The only cases where i can think of that occurring is if >> the heap occupancy is also montonically increasing so that the "free space" available keeps >> getting smaller. But I am grasping at straws here. > > Output from jstat seems to indicate that's not the case here. Unfortunately, we're seeing this on a production server that doesn't have GC logging enabled. We're in the process of trying to get it enabled so we can try to understand this better. > OK, thanks. >>> 2. And as a follow on question. If calling System.gc() leaves the heap in a better state than a promotion failure? (This will help us to answer whether we want to push for a server restart or a scheduled GC). >>> >> If you are not using +ExplicitGCInvokesConcurrent, then both would leave the heap >> in an identical state, because they both cause a single-threaded (alas, still) compacting >> collection of the entire heap. So, yes, scheduling explicit gc's to compact down >> the heap at an opportune time would definitely be worthwhile, if possible. >> >>> 3. Would using fewer parallel GC threads help reduce the fragmentation by having fewer PLABs? >> Yes, this is usually the case. More PLAB's (in the form of cached free lists with the >> individual GC worker threads) does translate to potentially more fragmentation, although >> i have generally found that our autonomic per-block inventory control usually results >> in keeping such fragmentation in check (unless the threads are "far too many" and the >> free space "too little"). > > We're running on a 32-way x4600 and aren't setting the ParallelGC threads explicitly, so we're probably ending up with 32. We will try to dial it down to see how that helps. > I think you get 5/8*n, so prbably closer to 20. With the amount of data that is copied per scavenge and the size of yr old gen, 20 seems reasonable and probably does not need dialing down (at least at first blush). From looking at the snippets you sent, it almost seems like some kind of bug in CMS allocation because there is plenty of free space (and comparatively not that much promotion) when the promotion failure occurs (although full gc logs would be needed before one could be confident of this pronouncement). So it would be worthwhile to investigate this closely to see why this is happening. I somehow do not think this is a tuning issue, but something else. Do you have Java support and able to open a formal ticket with Oracle, so some formal/dedicated cycles can be devoted looking at the issue? What's the version of JDK you are running? -- ramki > Thanks, > Brian From tanman12345 at yahoo.com Wed Nov 17 14:44:42 2010 From: tanman12345 at yahoo.com (Erwin) Date: Wed, 17 Nov 2010 14:44:42 -0800 (PST) Subject: Intermittent long ParNew times Message-ID: <288296.29496.qm@web111114.mail.gq1.yahoo.com> Hello, When we?re running our load test for about 1 hour, GC seems to be fine most of the times. However, there are times where the ParNew would go as high as 25 seconds. See below sample where it was 10 seconds. {Heap before GC invocations=11 (full 0): par new generation total 921600K, used 880508K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) from space 102400K, 59% used [0xfffffffe08400000, 0xfffffffe0bfdf1c8, 0xfffffffe0e800000) to space 102400K, 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000) concurrent-mark-sweep perm gen total 524288K, used 221175K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 37814.384: [GC 37814.384: [ParNew: 880508K->55794K(921600K), 0.1246958 secs] 1566486K->741772K(4091904K), 0.1249910 secs] [Times: user=0.37 sys=0.07, real=0.13 secs] Heap after GC invocations=12 (full 0): par new generation total 921600K, used 55794K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) eden space 819200K, 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) from space 102400K, 54% used [0xfffffffe02000000, 0xfffffffe0567c880, 0xfffffffe08400000) to space 102400K, 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000) concurrent-mark-sweep perm gen total 524288K, used 221175K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) {Heap before GC invocations=12 (full 0): par new generation total 921600K, used 874994K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) from space 102400K, 54% used [0xfffffffe02000000, 0xfffffffe0567c880, 0xfffffffe08400000) to space 102400K, 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000) concurrent-mark-sweep perm gen total 524288K, used 221531K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 39088.225: [GC 39088.225: [ParNew: 874994K->102400K(921600K), 10.0339890 secs] 1560972K->821401K(4091904K), 10.0346984 secs] [Times: user=5.40 sys=31.71, real=10.04 secs] Heap after GC invocations=13 (full 0): par new generation total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) eden space 819200K, 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) to space 102400K, 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) concurrent mark-sweep generation total 3170304K, used 719001K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000) concurrent-mark-sweep perm gen total 524288K, used 221531K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) We?re on 64bit platform of WAS NDE 7.0.0.9 on Solaris10 platform. Our JVM args are: -server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled Any help would be appreciated. Erwin From y.s.ramakrishna at oracle.com Wed Nov 17 16:27:02 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Wed, 17 Nov 2010 16:27:02 -0800 Subject: Intermittent long ParNew times In-Reply-To: <288296.29496.qm@web111114.mail.gq1.yahoo.com> References: <288296.29496.qm@web111114.mail.gq1.yahoo.com> Message-ID: <4CE472D6.8000105@oracle.com> That long scavenge shows unusually high system time too. Did you make sure there was no periodic (or aperiodic) other activity on the system that may be causing part of the JVM heap to get paged out? I'd check vmstat for starters. (Also, FWIW, and just to rule it out as a factor, check the promotion volume for these scavenges and see if it shows anything.) And while I am throwing out conjectures, does this happen only during the initial start-up phase when the old heap occupancy is growing? If so, see if -XX:+AlwaysPreTouch makes any difference (also mentioned recently by Brian Williams in a separate thread here). -- ramki On 11/17/10 14:44, Erwin wrote: > Hello, > > When we?re running our load test for about 1 hour, GC seems to be fine most of the times. However, there are times where the ParNew would go as high as 25 seconds. See below sample where it was 10 seconds. > {Heap before GC invocations=11 (full 0): > par new generation total 921600K, used 880508K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > from space 102400K, 59% used [0xfffffffe08400000, 0xfffffffe0bfdf1c8, 0xfffffffe0e800000) > to space 102400K, 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000) > concurrent-mark-sweep perm gen total 524288K, used 221175K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 37814.384: [GC 37814.384: [ParNew: 880508K->55794K(921600K), 0.1246958 secs] 1566486K->741772K(4091904K), 0.1249910 secs] [Times: user=0.37 sys=0.07, real=0.13 secs] > Heap after GC invocations=12 (full 0): > par new generation total 921600K, used 55794K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > eden space 819200K, 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > from space 102400K, 54% used [0xfffffffe02000000, 0xfffffffe0567c880, 0xfffffffe08400000) > to space 102400K, 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000) > concurrent-mark-sweep perm gen total 524288K, used 221175K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > > {Heap before GC invocations=12 (full 0): > par new generation total 921600K, used 874994K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > from space 102400K, 54% used [0xfffffffe02000000, 0xfffffffe0567c880, 0xfffffffe08400000) > to space 102400K, 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000) > concurrent-mark-sweep perm gen total 524288K, used 221531K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 39088.225: [GC 39088.225: [ParNew: 874994K->102400K(921600K), 10.0339890 secs] 1560972K->821401K(4091904K), 10.0346984 secs] [Times: user=5.40 sys=31.71, real=10.04 secs] > Heap after GC invocations=13 (full 0): > par new generation total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > eden space 819200K, 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > to space 102400K, 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > concurrent mark-sweep generation total 3170304K, used 719001K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000) > concurrent-mark-sweep perm gen total 524288K, used 221531K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > > We?re on 64bit platform of WAS NDE 7.0.0.9 on Solaris10 platform. Our JVM args are: > -server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled > > Any help would be appreciated. > Erwin > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use