From Dori.Rabin at Starhome.com  Thu Nov  4 06:27:57 2010
From: Dori.Rabin at Starhome.com (Rabin Dori)
Date: Thu, 4 Nov 2010 15:27:57 +0200
Subject: problem in gc with incremental mode
Message-ID: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local>

Hi,
Once in a while and for a reason I cannot understand the CMS kicks up too late which causes a promotion failure and full GC that takes very long (more than 2 minutes which causes other problems)...
My question is how to tune the gc flags in order to make sure that the concurrent sweep will always occur in parallel (incremental mode) without long pause stop the world but also without reaching its maximum capacity ?

(I know that in my case the CMSInitiatingOccupancyFraction=60 is ignored because of the CMSIncrementalMode
And from looking in the log file we can see that the old generation reaches size of 835'959K out of 843'000K at the time the concurrent failure (I marked this line in red)

I am running the jvm with the following parameters :
wrapper.java.additional.4=-XX:NewSize=200m
wrapper.java.additional.5=-XX:SurvivorRatio=6
wrapper.java.additional.6=-XX:MaxTenuringThreshold=4
wrapper.java.additional.7=-XX:+CMSIncrementalMode
wrapper.java.additional.8=-XX:+CMSIncrementalPacing
wrapper.java.additional.9=-XX:+DisableExplicitGC
wrapper.java.additional.10=-XX:+UseConcMarkSweepGC
wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional.12=-XX:+PrintGCDetails
wrapper.java.additional.13=-XX:+PrintGCTimeStamps
wrapper.java.additional.14=-XX:-TraceClassUnloading
wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError
wrapper.java.additional.16=-verbose:gc
wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60
wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly
wrapper.java.additional.19=-XX:+PrintTenuringDistribution


Extracts from the log file:
INFO   | jvm 1    | 2010/11/02 04:54:33 | Desired survivor size 13107200 bytes, new threshold 4 (max 4)
INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   1:     544000 bytes,     544000 total
INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   2:     346320 bytes,     890320 total
INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   3:     262800 bytes,    1153120 total
INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   4:     238528 bytes,    1391648 total
INFO   | jvm 1    | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 secs] [Times: user=0.00 sys=0.00, real=0.11 secs]
INFO   | jvm 1    | 2010/11/02 04:55:54 | 422097.583: [GC 422097.583: [ParNew
INFO   | jvm 1    | 2010/11/02 04:55:54 |
INFO   | jvm 1    | 2010/11/02 04:55:54 | Desired survivor size 13107200 bytes, new threshold 4 (max 4)
INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   1:     577104 bytes,     577104 total
INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   2:     261856 bytes,     838960 total
INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   3:     298832 bytes,    1137792 total
INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   4:     259176 bytes,    1396968 total
INFO   | jvm 1    | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 secs] [Times: user=0.00 sys=0.00, real=0.05 secs]
INFO   | jvm 1    | 2010/11/02 04:57:27 | 422190.993: [GC 422190.993: [ParNew
INFO   | jvm 1    | 2010/11/02 04:57:28 |
INFO   | jvm 1    | 2010/11/02 04:57:28 | Desired survivor size 13107200 bytes, new threshold 4 (max 4)
INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   1:     676656 bytes,     676656 total
INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   2:     283376 bytes,     960032 total
INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   3:     239472 bytes,    1199504 total
INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   4:     264960 bytes,    1464464 total
INFO   | jvm 1    | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 secs] [Times: user=0.01 sys=0.00, real=0.07 secs]
INFO   | jvm 1    | 2010/11/02 04:58:54 | 422277.406: [GC 422277.406: [ParNew
INFO   | jvm 1    | 2010/11/02 04:58:54 | Desired survivor size 13107200 bytes, new threshold 4 (max 4)
INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   1:     615944 bytes,     615944 total
INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   2:     334120 bytes,     950064 total
INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   3:     276736 bytes,    1226800 total
INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   4:     236424 bytes,    1463224 total
INFO   | jvm 1    | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
INFO   | jvm 1    | 2010/11/02 05:00:23 | 422366.439: [GC 422366.439: [ParNew
INFO   | jvm 1    | 2010/11/02 05:00:23 |  (promotion failed)
INFO   | jvm 1    | 2010/11/02 05:00:23 | Desired survivor size 13107200 bytes, new threshold 4 (max 4)
INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   1:     574000 bytes,     574000 total
INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   2:     315432 bytes,     889432 total
INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   3:     281216 bytes,    1170648 total
INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   4:     271776 bytes,    1442424 total
INFO   | jvm 1    | 2010/11/02 05:00:23 | : 155528K->155689K(179200K), 0.1007840 secs]422366.540: [CMS
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor121]
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor119]
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor124]
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor123]
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor120]
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor122]
ERROR  | wrapper  | 2010/11/02 05:02:37 | JVM appears hung: Timed out waiting for signal from JVM.


Dori Rabin
[cid:image001.gif at 01CB7C2F.2582B020]

[cid:image002.jpg at 01CB7C2F.2582B020]
T. +972-3-123-4567   F. +972-3- 766-3559   M. +972-54- 4232-706
Email: mailto:Dori<mailto:your-email-address>.Rabin at starhome.com


[cid:image003.gif at 01CB7C2F.2582B020]<http://www.starhome.com/>   [cid:image004.gif at 01CB7C2F.2582B020] <http://blog.starhome.com/>    [cid:image005.gif at 01CB7C2F.2582B020] <http://bit.ly/9SbzNs>    [cid:image006.gif at 01CB7C2F.2582B020] <http://bit.ly/aoU2m3>    [cid:image007.gif at 01CB7C2F.2582B020] <http://linkd.in/bjscKL>
This email contains proprietary and/or confidential information of Starhome. If you
have received this email in error, please delete all copies without delay and do not
copy, distribute, or rely on any information contained in this email. Thank you!


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 2143 bytes
Desc: image001.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0006.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 909 bytes
Desc: image002.jpg
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0001.jpg 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.gif
Type: image/gif
Size: 470 bytes
Desc: image003.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0007.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.gif
Type: image/gif
Size: 480 bytes
Desc: image004.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0008.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.gif
Type: image/gif
Size: 427 bytes
Desc: image005.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0009.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.gif
Type: image/gif
Size: 397 bytes
Desc: image006.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0010.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.gif
Type: image/gif
Size: 422 bytes
Desc: image007.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/ff2b1001/attachment-0011.gif 

From y.s.ramakrishna at oracle.com  Thu Nov  4 09:51:39 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Thu, 04 Nov 2010 09:51:39 -0700
Subject: problem in gc with incremental mode
In-Reply-To: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local>
References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local>
Message-ID: <4CD2E49B.1080008@oracle.com>

Hi Dori --

What's the version of JDK you are running? Can you share a complete log?
It appears as though the iCMS "auto-pacing" is, for some reason, not
kicking in "soon enough"; one workaround is to use turn off auto-pacing
and use an explicit duty-cycle (which has its own disadvantages).

I'd suggest filing a bug, and including a complete log file showing
the problem.

thanks.
-- ramki

On 11/04/10 06:27, Rabin Dori wrote:
> Hi,
> 
> Once in a while and for a reason I cannot understand the CMS kicks up 
> too late which causes a promotion failure and full GC that takes very 
> long (more than 2 minutes which causes other problems)?
> 
> My question is how to tune the gc flags in order to make sure that the 
> concurrent sweep will always occur in parallel (incremental mode) 
> without long pause stop the world but also without reaching its maximum 
> capacity ?
> 
>  
> 
> (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is 
> ignored because of the CMSIncrementalMode
> 
> And from looking in the log file we can see that the old generation 
> reaches size of 835?959K out of 843?000K at the time the concurrent 
> failure (I marked this line in red)
> 
>  
> 
> *_I am running the jvm with the following parameters :_*
> 
> wrapper.java.additional.4=-XX:NewSize=200m
> 
> wrapper.java.additional.5=-XX:SurvivorRatio=6
> 
> wrapper.java.additional.6=-XX:MaxTenuringThreshold=4
> 
> wrapper.java.additional.7=-XX:+CMSIncrementalMode
> 
> wrapper.java.additional.8=-XX:+CMSIncrementalPacing
> 
> wrapper.java.additional.9=-XX:+DisableExplicitGC
> 
> wrapper.java.additional.10=-XX:+UseConcMarkSweepGC
> 
> wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled
> 
> wrapper.java.additional.12=-XX:+PrintGCDetails
> 
> wrapper.java.additional.13=-XX:+PrintGCTimeStamps
> 
> wrapper.java.additional.14=-XX:-TraceClassUnloading
> 
> wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError
> 
> wrapper.java.additional.16=-verbose:gc
> 
> wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60
> 
> wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly
> 
> wrapper.java.additional.19=-XX:+PrintTenuringDistribution
> 
>  
> 
>  
> 
> *_Extracts from the log file:_*
> 
> INFO   | jvm 1    | 2010/11/02 04:54:33 | Desired survivor size 13107200 
> bytes, new threshold 4 (max 4)
> 
> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   1:     544000 
> bytes,     544000 total
> 
> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   2:     346320 
> bytes,     890320 total
> 
> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   3:     262800 
> bytes,    1153120 total
> 
> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   4:     238528 
> bytes,    1391648 total
> 
> INFO   | jvm 1    | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), 
> 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 secs] 
> [Times: user=0.00 sys=0.00, real=0.11 secs]
> 
> INFO   | jvm 1    | 2010/11/02 04:55:54 | 422097.583: [GC 422097.583: 
> [ParNew
> 
> INFO   | jvm 1    | 2010/11/02 04:55:54 |
> 
> INFO   | jvm 1    | 2010/11/02 04:55:54 | Desired survivor size 13107200 
> bytes, new threshold 4 (max 4)
> 
> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   1:     577104 
> bytes,     577104 total
> 
> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   2:     261856 
> bytes,     838960 total
> 
> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   3:     298832 
> bytes,    1137792 total
> 
> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   4:     259176 
> bytes,    1396968 total
> 
> INFO   | jvm 1    | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), 
> 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 secs] 
> [Times: user=0.00 sys=0.00, real=0.05 secs]
> 
> INFO   | jvm 1    | 2010/11/02 04:57:27 | 422190.993: [GC 422190.993: 
> [ParNew
> 
> INFO   | jvm 1    | 2010/11/02 04:57:28 |
> 
> INFO   | jvm 1    | 2010/11/02 04:57:28 | Desired survivor size 13107200 
> bytes, new threshold 4 (max 4)
> 
> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   1:     676656 
> bytes,     676656 total
> 
> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   2:     283376 
> bytes,     960032 total
> 
> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   3:     239472 
> bytes,    1199504 total
> 
> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   4:     264960 
> bytes,    1464464 total
> 
> INFO   | jvm 1    | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), 
> 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 secs] 
> [Times: user=0.01 sys=0.00, real=0.07 secs]
> 
> INFO   | jvm 1    | 2010/11/02 04:58:54 | 422277.406: [GC 422277.406: 
> [ParNew
> 
> INFO   | jvm 1    | 2010/11/02 04:58:54 | Desired survivor size 13107200 
> bytes, new threshold 4 (max 4)
> 
> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   1:     615944 
> bytes,     615944 total
> 
> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   2:     334120 
> bytes,     950064 total
> 
> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   3:     276736 
> bytes,    1226800 total
> 
> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   4:     236424 
> bytes,    1463224 total
> 
> INFO   | jvm 1    | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), 
> 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 secs] 
> [Times: user=0.00 sys=0.00, real=0.04 secs]
> 
> INFO   | jvm 1    | 2010/11/02 05:00:23 | 422366.439: [GC 422366.439: 
> [ParNew
> 
> INFO   | jvm 1    | 2010/11/02 05:00:23 |  (promotion failed)
> 
> INFO   | jvm 1    | 2010/11/02 05:00:23 | Desired survivor size 13107200 
> bytes, new threshold 4 (max 4)
> 
> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   1:     574000 
> bytes,     574000 total
> 
> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   2:     315432 
> bytes,     889432 total
> 
> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   3:     281216 
> bytes,    1170648 total
> 
> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   4:     271776 
> bytes,    1442424 total
> 
> INFO   | jvm 1    | 2010/11/02 05:00:23 | : 155528K->155689K(179200K), 
> 0.1007840 secs]422366.540: [CMS
> 
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor121]
> 
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor119]
> 
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor124]
> 
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor123]
> 
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor120]
> 
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor122]
> 
> ERROR  | wrapper  | 2010/11/02 05:02:37 | JVM appears hung: Timed out 
> waiting for signal from JVM.
> 
>  
> 
>  
> 
> *Dori Rabin*
> 
> *cid:image001.gif at 01CB69E7.E5E45760*
> 
>  
> 
> *cid:image002.jpg at 01CB69E7.E5E45760*
> 
> T. +972-3-123-4567   F. +972-3- 766-3559   M. +972-54- 4232-706
> 
> Email: mailto:Dori <mailto:your-email-address>.Rabin at starhome.com
> 
>  
> 
>  
> 
> *cid:image003.gif at 01CB69E7.E5E45760* <http://www.starhome.com/>   
> *cid:image004.gif at 01CB69E7.E5E45760* 
> <http://blog.starhome.com/>   *cid:image005.gif at 01CB69E7.E5E45760* 
> <http://bit.ly/9SbzNs>   *cid:image006.gif at 01CB69E7.E5E45760* 
> <http://bit.ly/aoU2m3>   *cid:image007.gif at 01CB69E7.E5E45760* 
> <http://linkd.in/bjscKL>   
> 
> This email contains proprietary and/or confidential information of 
> Starhome. If you
> 
> have received this email in error, please delete all copies without 
> delay and do not
> 
> copy, distribute, or rely on any information contained in this email. 
> Thank you!
> 
>  
> 
>  
> 
>  
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From y.s.ramakrishna at oracle.com  Thu Nov  4 09:54:52 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Thu, 04 Nov 2010 09:54:52 -0700
Subject: problem in gc with incremental mode
In-Reply-To: <4CD2E49B.1080008@oracle.com>
References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local>
	<4CD2E49B.1080008@oracle.com>
Message-ID: <4CD2E55C.6060505@oracle.com>

Also, consider whether you really need to use iCMS, or you could
make do with regular CMS (if you have sufficiently many
cores) and can scale back the # parallel threads used by CMS
marking to reduce the impact on mutators. This can often be a
more suitable configuration for multi-core platforms than
the use of iCMS (the latter because of the way it paces itself,
can often carry more floating garbage than regular CMS where the
cycle starts and completes more quickly).

-- ramki

On 11/04/10 09:51, Y. S. Ramakrishna wrote:
> Hi Dori --
> 
> What's the version of JDK you are running? Can you share a complete log?
> It appears as though the iCMS "auto-pacing" is, for some reason, not
> kicking in "soon enough"; one workaround is to use turn off auto-pacing
> and use an explicit duty-cycle (which has its own disadvantages).
> 
> I'd suggest filing a bug, and including a complete log file showing
> the problem.
> 
> thanks.
> -- ramki
> 
> On 11/04/10 06:27, Rabin Dori wrote:
>> Hi,
>>
>> Once in a while and for a reason I cannot understand the CMS kicks up 
>> too late which causes a promotion failure and full GC that takes very 
>> long (more than 2 minutes which causes other problems)?
>>
>> My question is how to tune the gc flags in order to make sure that the 
>> concurrent sweep will always occur in parallel (incremental mode) 
>> without long pause stop the world but also without reaching its 
>> maximum capacity ?
>>
>>  
>>
>> (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is 
>> ignored because of the CMSIncrementalMode
>>
>> And from looking in the log file we can see that the old generation 
>> reaches size of 835?959K out of 843?000K at the time the concurrent 
>> failure (I marked this line in red)
>>
>>  
>>
>> *_I am running the jvm with the following parameters :_*
>>
>> wrapper.java.additional.4=-XX:NewSize=200m
>>
>> wrapper.java.additional.5=-XX:SurvivorRatio=6
>>
>> wrapper.java.additional.6=-XX:MaxTenuringThreshold=4
>>
>> wrapper.java.additional.7=-XX:+CMSIncrementalMode
>>
>> wrapper.java.additional.8=-XX:+CMSIncrementalPacing
>>
>> wrapper.java.additional.9=-XX:+DisableExplicitGC
>>
>> wrapper.java.additional.10=-XX:+UseConcMarkSweepGC
>>
>> wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled
>>
>> wrapper.java.additional.12=-XX:+PrintGCDetails
>>
>> wrapper.java.additional.13=-XX:+PrintGCTimeStamps
>>
>> wrapper.java.additional.14=-XX:-TraceClassUnloading
>>
>> wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError
>>
>> wrapper.java.additional.16=-verbose:gc
>>
>> wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60
>>
>> wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly
>>
>> wrapper.java.additional.19=-XX:+PrintTenuringDistribution
>>
>>  
>>
>>  
>>
>> *_Extracts from the log file:_*
>>
>> INFO   | jvm 1    | 2010/11/02 04:54:33 | Desired survivor size 
>> 13107200 bytes, new threshold 4 (max 4)
>>
>> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   1:     544000 
>> bytes,     544000 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   2:     346320 
>> bytes,     890320 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   3:     262800 
>> bytes,    1153120 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   4:     238528 
>> bytes,    1391648 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), 
>> 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 secs] 
>> [Times: user=0.00 sys=0.00, real=0.11 secs]
>>
>> INFO   | jvm 1    | 2010/11/02 04:55:54 | 422097.583: [GC 422097.583: 
>> [ParNew
>>
>> INFO   | jvm 1    | 2010/11/02 04:55:54 |
>>
>> INFO   | jvm 1    | 2010/11/02 04:55:54 | Desired survivor size 
>> 13107200 bytes, new threshold 4 (max 4)
>>
>> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   1:     577104 
>> bytes,     577104 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   2:     261856 
>> bytes,     838960 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   3:     298832 
>> bytes,    1137792 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   4:     259176 
>> bytes,    1396968 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), 
>> 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 secs] 
>> [Times: user=0.00 sys=0.00, real=0.05 secs]
>>
>> INFO   | jvm 1    | 2010/11/02 04:57:27 | 422190.993: [GC 422190.993: 
>> [ParNew
>>
>> INFO   | jvm 1    | 2010/11/02 04:57:28 |
>>
>> INFO   | jvm 1    | 2010/11/02 04:57:28 | Desired survivor size 
>> 13107200 bytes, new threshold 4 (max 4)
>>
>> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   1:     676656 
>> bytes,     676656 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   2:     283376 
>> bytes,     960032 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   3:     239472 
>> bytes,    1199504 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   4:     264960 
>> bytes,    1464464 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), 
>> 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 secs] 
>> [Times: user=0.01 sys=0.00, real=0.07 secs]
>>
>> INFO   | jvm 1    | 2010/11/02 04:58:54 | 422277.406: [GC 422277.406: 
>> [ParNew
>>
>> INFO   | jvm 1    | 2010/11/02 04:58:54 | Desired survivor size 
>> 13107200 bytes, new threshold 4 (max 4)
>>
>> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   1:     615944 
>> bytes,     615944 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   2:     334120 
>> bytes,     950064 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   3:     276736 
>> bytes,    1226800 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   4:     236424 
>> bytes,    1463224 total
>>
>> INFO   | jvm 1    | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), 
>> 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 secs] 
>> [Times: user=0.00 sys=0.00, real=0.04 secs]
>>
>> INFO   | jvm 1    | 2010/11/02 05:00:23 | 422366.439: [GC 422366.439: 
>> [ParNew
>>
>> INFO   | jvm 1    | 2010/11/02 05:00:23 |  (promotion failed)
>>
>> INFO   | jvm 1    | 2010/11/02 05:00:23 | Desired survivor size 
>> 13107200 bytes, new threshold 4 (max 4)
>>
>> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   1:     574000 
>> bytes,     574000 total
>>
>> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   2:     315432 
>> bytes,     889432 total
>>
>> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   3:     281216 
>> bytes,    1170648 total
>>
>> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   4:     271776 
>> bytes,    1442424 total
>>
>> INFO   | jvm 1    | 2010/11/02 05:00:23 | : 155528K->155689K(179200K), 
>> 0.1007840 secs]422366.540: [CMS
>>
>> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
>> sun.reflect.GeneratedMethodAccessor121]
>>
>> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
>> sun.reflect.GeneratedMethodAccessor119]
>>
>> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
>> sun.reflect.GeneratedMethodAccessor124]
>>
>> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
>> sun.reflect.GeneratedMethodAccessor123]
>>
>> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
>> sun.reflect.GeneratedMethodAccessor120]
>>
>> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
>> sun.reflect.GeneratedMethodAccessor122]
>>
>> ERROR  | wrapper  | 2010/11/02 05:02:37 | JVM appears hung: Timed out 
>> waiting for signal from JVM.
>>
>>  
>>
>>  
>>
>> *Dori Rabin*
>>
>> *cid:image001.gif at 01CB69E7.E5E45760*
>>
>>  
>>
>> *cid:image002.jpg at 01CB69E7.E5E45760*
>>
>> T. +972-3-123-4567   F. +972-3- 766-3559   M. +972-54- 4232-706
>>
>> Email: mailto:Dori <mailto:your-email-address>.Rabin at starhome.com
>>
>>  
>>
>>  
>>
>> *cid:image003.gif at 01CB69E7.E5E45760* <http://www.starhome.com/>   
>> *cid:image004.gif at 01CB69E7.E5E45760* <http://blog.starhome.com/>   
>> *cid:image005.gif at 01CB69E7.E5E45760* <http://bit.ly/9SbzNs>   
>> *cid:image006.gif at 01CB69E7.E5E45760* <http://bit.ly/aoU2m3>   
>> *cid:image007.gif at 01CB69E7.E5E45760* <http://linkd.in/bjscKL>  
>> This email contains proprietary and/or confidential information of 
>> Starhome. If you
>>
>> have received this email in error, please delete all copies without 
>> delay and do not
>>
>> copy, distribute, or rely on any information contained in this email. 
>> Thank you!
>>
>>  
>>
>>  
>>
>>  
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 

From jon.masamitsu at oracle.com  Thu Nov  4 10:49:01 2010
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Thu, 04 Nov 2010 10:49:01 -0700
Subject: problem in gc with incremental mode
In-Reply-To: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local>
References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local>
Message-ID: <4CD2F20D.3040506@oracle.com>

Which version of the JDK are you using and what type
of platform are you running on?

On 11/04/10 06:27, Rabin Dori wrote:
>
> Hi,
>
> Once in a while and for a reason I cannot understand the CMS kicks up 
> too late which causes a promotion failure and full GC that takes very 
> long (more than 2 minutes which causes other problems)...
>
> My question is how to tune the gc flags in order to make sure that the 
> concurrent sweep will always occur in parallel (incremental mode) 
> without long pause stop the world but also without reaching its 
> maximum capacity ?
>
>  
>
> (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is 
> ignored because of the CMSIncrementalMode
>
> And from looking in the log file we can see that the old generation 
> reaches size of 835'959K out of 843'000K at the time the concurrent 
> failure (I marked this line in red)
>
>  
>
> *_I am running the jvm with the following parameters :_*
>
> wrapper.java.additional.4=-XX:NewSize=200m
>
> wrapper.java.additional.5=-XX:SurvivorRatio=6
>
> wrapper.java.additional.6=-XX:MaxTenuringThreshold=4
>
> wrapper.java.additional.7=-XX:+CMSIncrementalMode
>
> wrapper.java.additional.8=-XX:+CMSIncrementalPacing
>
> wrapper.java.additional.9=-XX:+DisableExplicitGC
>
> wrapper.java.additional.10=-XX:+UseConcMarkSweepGC
>
> wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled
>
> wrapper.java.additional.12=-XX:+PrintGCDetails
>
> wrapper.java.additional.13=-XX:+PrintGCTimeStamps
>
> wrapper.java.additional.14=-XX:-TraceClassUnloading
>
> wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError
>
> wrapper.java.additional.16=-verbose:gc
>
> wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60
>
> wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly
>
> wrapper.java.additional.19=-XX:+PrintTenuringDistribution
>
>  
>
>  
>
> *_Extracts from the log file:_*
>
> INFO   | jvm 1    | 2010/11/02 04:54:33 | Desired survivor size 
> 13107200 bytes, new threshold 4 (max 4)
>
> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   1:     544000 
> bytes,     544000 total
>
> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   2:     346320 
> bytes,     890320 total
>
> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   3:     262800 
> bytes,    1153120 total
>
> INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   4:     238528 
> bytes,    1391648 total
>
> INFO   | jvm 1    | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), 
> 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 secs] 
> [Times: user=0.00 sys=0.00, real=0.11 secs]
>
> INFO   | jvm 1    | 2010/11/02 04:55:54 | 422097.583: [GC 422097.583: 
> [ParNew
>
> INFO   | jvm 1    | 2010/11/02 04:55:54 |
>
> INFO   | jvm 1    | 2010/11/02 04:55:54 | Desired survivor size 
> 13107200 bytes, new threshold 4 (max 4)
>
> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   1:     577104 
> bytes,     577104 total
>
> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   2:     261856 
> bytes,     838960 total
>
> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   3:     298832 
> bytes,    1137792 total
>
> INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   4:     259176 
> bytes,    1396968 total
>
> INFO   | jvm 1    | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), 
> 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 secs] 
> [Times: user=0.00 sys=0.00, real=0.05 secs]
>
> INFO   | jvm 1    | 2010/11/02 04:57:27 | 422190.993: [GC 422190.993: 
> [ParNew
>
> INFO   | jvm 1    | 2010/11/02 04:57:28 |
>
> INFO   | jvm 1    | 2010/11/02 04:57:28 | Desired survivor size 
> 13107200 bytes, new threshold 4 (max 4)
>
> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   1:     676656 
> bytes,     676656 total
>
> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   2:     283376 
> bytes,     960032 total
>
> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   3:     239472 
> bytes,    1199504 total
>
> INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   4:     264960 
> bytes,    1464464 total
>
> INFO   | jvm 1    | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), 
> 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 secs] 
> [Times: user=0.01 sys=0.00, real=0.07 secs]
>
> INFO   | jvm 1    | 2010/11/02 04:58:54 | 422277.406: [GC 422277.406: 
> [ParNew
>
> INFO   | jvm 1    | 2010/11/02 04:58:54 | Desired survivor size 
> 13107200 bytes, new threshold 4 (max 4)
>
> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   1:     615944 
> bytes,     615944 total
>
> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   2:     334120 
> bytes,     950064 total
>
> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   3:     276736 
> bytes,    1226800 total
>
> INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   4:     236424 
> bytes,    1463224 total
>
> INFO   | jvm 1    | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), 
> 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 secs] 
> [Times: user=0.00 sys=0.00, real=0.04 secs]
>
> INFO   | jvm 1    | 2010/11/02 05:00:23 | 422366.439: [GC 422366.439: 
> [ParNew
>
> INFO   | jvm 1    | 2010/11/02 05:00:23 |  (promotion failed)
>
> INFO   | jvm 1    | 2010/11/02 05:00:23 | Desired survivor size 
> 13107200 bytes, new threshold 4 (max 4)
>
> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   1:     574000 
> bytes,     574000 total
>
> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   2:     315432 
> bytes,     889432 total
>
> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   3:     281216 
> bytes,    1170648 total
>
> INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   4:     271776 
> bytes,    1442424 total
>
> INFO   | jvm 1    | 2010/11/02 05:00:23 | : 155528K->155689K(179200K), 
> 0.1007840 secs]422366.540: [CMS
>
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor121]
>
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor119]
>
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor124]
>
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor123]
>
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor120]
>
> INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class 
> sun.reflect.GeneratedMethodAccessor122]
>
> ERROR  | wrapper  | 2010/11/02 05:02:37 | JVM appears hung: Timed out 
> waiting for signal from JVM.
>
>  
>
>  
>
> *Dori Rabin*
>
> *cid:image001.gif at 01CB69E7.E5E45760*
>
>  
>
> *cid:image002.jpg at 01CB69E7.E5E45760*
>
> T. +972-3-123-4567   F. +972-3- 766-3559   M. +972-54- 4232-706
>
> Email: mailto:Dori <mailto:your-email-address>.Rabin at starhome.com
>
>  
>
>  
>
> *cid:image003.gif at 01CB69E7.E5E45760* <http://www.starhome.com/>   
> *cid:image004.gif at 01CB69E7.E5E45760* 
> <http://blog.starhome.com/>   *cid:image005.gif at 01CB69E7.E5E45760* 
> <http://bit.ly/9SbzNs>   *cid:image006.gif at 01CB69E7.E5E45760* 
> <http://bit.ly/aoU2m3>   *cid:image007.gif at 01CB69E7.E5E45760* 
> <http://linkd.in/bjscKL>   
>
> This email contains proprietary and/or confidential information of 
> Starhome. If you
>
> have received this email in error, please delete all copies without 
> delay and do not
>
> copy, distribute, or rely on any information contained in this email. 
> Thank you!
>
>  
>
>  
>
>  
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 2143 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0006.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 909 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0001.jpe 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 470 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0007.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 480 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0008.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 427 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0009.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 397 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0010.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 422 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101104/802fe862/attachment-0011.gif 

From Alexander.Livitz at on24.com  Sun Nov  7 13:24:57 2010
From: Alexander.Livitz at on24.com (Alexander Livitz)
Date: Sun, 7 Nov 2010 13:24:57 -0800
Subject: problem in gc with incremental mode
In-Reply-To: <4CD2F20D.3040506@oracle.com>
References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local>
	<4CD2F20D.3040506@oracle.com>
Message-ID: <4C717D4720DE704383A5D24A4D46E1C681D2A9AFDD@P-HQEXCHANGE.on24.com>

We had similar problem with promotion failures. The issue was completely solved with the following set:

-Xms12g -Xmx12g -Xmn2g \
 -XX:PermSize=384m \
 -XX:MaxPermSize=384m \
 -XX:-UseGCOverheadLimit \
 -XX:+DontYieldALot \
 -XX:+UseStringCache \
 -XX:+DoEscapeAnalysis \
 -XX:+AdjustConcurrency \
 -XX:+OptimizeStringConcat \
 -XX:ReservedCodeCacheSize=64m \
 -XX:+UseConcMarkSweepGC \
 -XX:+UseParNewGC \
 -XX:SurvivorRatio=1 \
 -XX:TargetSurvivorRatio=60 \
 -XX:MaxTenuringThreshold=7 \
 -XX:+ParallelRefProcEnabled \
 -XX:ParallelGCThreads=30 \
 -XX:ParallelCMSThreads=24 \
 -XX:+CMSPrecleanRefLists2 \
 -XX:+CMSScavengeBeforeRemark \
 -XX:+CMSClassUnloadingEnabled \
 -XX:CMSMaxAbortablePrecleanTime=15000 \
 -XX:CMSInitiatingOccupancyFraction=65 \

Alex

From: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Jon Masamitsu
Sent: Thursday, November 04, 2010 10:49 AM
To: Rabin Dori
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: problem in gc with incremental mode

Which version of the JDK are you using and what type
of platform are you running on?

On 11/04/10 06:27, Rabin Dori wrote:
Hi,
Once in a while and for a reason I cannot understand the CMS kicks up too late which causes a promotion failure and full GC that takes very long (more than 2 minutes which causes other problems)...
My question is how to tune the gc flags in order to make sure that the concurrent sweep will always occur in parallel (incremental mode) without long pause stop the world but also without reaching its maximum capacity ?

(I know that in my case the CMSInitiatingOccupancyFraction=60 is ignored because of the CMSIncrementalMode
And from looking in the log file we can see that the old generation reaches size of 835'959K out of 843'000K at the time the concurrent failure (I marked this line in red)

I am running the jvm with the following parameters :
wrapper.java.additional.4=-XX:NewSize=200m
wrapper.java.additional.5=-XX:SurvivorRatio=6
wrapper.java.additional.6=-XX:MaxTenuringThreshold=4
wrapper.java.additional.7=-XX:+CMSIncrementalMode
wrapper.java.additional.8=-XX:+CMSIncrementalPacing
wrapper.java.additional.9=-XX:+DisableExplicitGC
wrapper.java.additional.10=-XX:+UseConcMarkSweepGC
wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional.12=-XX:+PrintGCDetails
wrapper.java.additional.13=-XX:+PrintGCTimeStamps
wrapper.java.additional.14=-XX:-TraceClassUnloading
wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError
wrapper.java.additional.16=-verbose:gc
wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60
wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly
wrapper.java.additional.19=-XX:+PrintTenuringDistribution


Extracts from the log file:
INFO   | jvm 1    | 2010/11/02 04:54:33 | Desired survivor size 13107200 bytes, new threshold 4 (max 4)
INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   1:     544000 bytes,     544000 total
INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   2:     346320 bytes,     890320 total
INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   3:     262800 bytes,    1153120 total
INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   4:     238528 bytes,    1391648 total
INFO   | jvm 1    | 2010/11/02 04:54:33 | : 155621K->2065K(179200K), 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500 secs] [Times: user=0.00 sys=0.00, real=0.11 secs]
INFO   | jvm 1    | 2010/11/02 04:55:54 | 422097.583: [GC 422097.583: [ParNew
INFO   | jvm 1    | 2010/11/02 04:55:54 |
INFO   | jvm 1    | 2010/11/02 04:55:54 | Desired survivor size 13107200 bytes, new threshold 4 (max 4)
INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   1:     577104 bytes,     577104 total
INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   2:     261856 bytes,     838960 total
INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   3:     298832 bytes,    1137792 total
INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   4:     259176 bytes,    1396968 total
INFO   | jvm 1    | 2010/11/02 04:55:54 | : 155664K->2386K(179200K), 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370 secs] [Times: user=0.00 sys=0.00, real=0.05 secs]
INFO   | jvm 1    | 2010/11/02 04:57:27 | 422190.993: [GC 422190.993: [ParNew
INFO   | jvm 1    | 2010/11/02 04:57:28 |
INFO   | jvm 1    | 2010/11/02 04:57:28 | Desired survivor size 13107200 bytes, new threshold 4 (max 4)
INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   1:     676656 bytes,     676656 total
INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   2:     283376 bytes,     960032 total
INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   3:     239472 bytes,    1199504 total
INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   4:     264960 bytes,    1464464 total
INFO   | jvm 1    | 2010/11/02 04:57:28 | : 155986K->1918K(179200K), 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200 secs] [Times: user=0.01 sys=0.00, real=0.07 secs]
INFO   | jvm 1    | 2010/11/02 04:58:54 | 422277.406: [GC 422277.406: [ParNew
INFO   | jvm 1    | 2010/11/02 04:58:54 | Desired survivor size 13107200 bytes, new threshold 4 (max 4)
INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   1:     615944 bytes,     615944 total
INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   2:     334120 bytes,     950064 total
INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   3:     276736 bytes,    1226800 total
INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   4:     236424 bytes,    1463224 total
INFO   | jvm 1    | 2010/11/02 04:58:54 | : 155518K->1928K(179200K), 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
INFO   | jvm 1    | 2010/11/02 05:00:23 | 422366.439: [GC 422366.439: [ParNew
INFO   | jvm 1    | 2010/11/02 05:00:23 |  (promotion failed)
INFO   | jvm 1    | 2010/11/02 05:00:23 | Desired survivor size 13107200 bytes, new threshold 4 (max 4)
INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   1:     574000 bytes,     574000 total
INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   2:     315432 bytes,     889432 total
INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   3:     281216 bytes,    1170648 total
INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   4:     271776 bytes,    1442424 total
INFO   | jvm 1    | 2010/11/02 05:00:23 | : 155528K->155689K(179200K), 0.1007840 secs]422366.540: [CMS
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor121]
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor119]
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor124]
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor123]
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor120]
INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class sun.reflect.GeneratedMethodAccessor122]
ERROR  | wrapper  | 2010/11/02 05:02:37 | JVM appears hung: Timed out waiting for signal from JVM.


Dori Rabin
[cid:image001.gif at 01CB7E7F.2A4962F0]

[cid:image002.jpg at 01CB7E7F.2A4962F0]
T. +972-3-123-4567   F. +972-3- 766-3559   M. +972-54- 4232-706
Email: mailto:Dori<mailto:your-email-address>.Rabin at starhome.com


[cid:image003.gif at 01CB7E7F.2A4962F0]<http://www.starhome.com/>   [cid:image004.gif at 01CB7E7F.2A4962F0] <http://blog.starhome.com/>    [cid:image005.gif at 01CB7E7F.2A4962F0] <http://bit.ly/9SbzNs>    [cid:image006.gif at 01CB7E7F.2A4962F0] <http://bit.ly/aoU2m3>    [cid:image007.gif at 01CB7E7F.2A4962F0] <http://linkd.in/bjscKL>
This email contains proprietary and/or confidential information of Starhome. If you
have received this email in error, please delete all copies without delay and do not
copy, distribute, or rely on any information contained in this email. Thank you!


________________________________


_______________________________________________

hotspot-gc-use mailing list

hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>

http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 2143 bytes
Desc: image001.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0006.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 909 bytes
Desc: image002.jpg
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0001.jpg 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.gif
Type: image/gif
Size: 470 bytes
Desc: image003.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0007.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.gif
Type: image/gif
Size: 480 bytes
Desc: image004.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0008.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.gif
Type: image/gif
Size: 427 bytes
Desc: image005.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0009.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.gif
Type: image/gif
Size: 397 bytes
Desc: image006.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0010.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.gif
Type: image/gif
Size: 422 bytes
Desc: image007.gif
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/60ed70fc/attachment-0011.gif 

From jon.masamitsu at oracle.com  Sun Nov  7 20:04:18 2010
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Sun, 07 Nov 2010 20:04:18 -0800
Subject: problem in gc with incremental mode
In-Reply-To: <AANLkTi=AH1otmk-g+mTy69MGXNE9-G8+ERXT3PuXNSs1@mail.gmail.com>
References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local>	<4CD2E49B.1080008@oracle.com>
	<AANLkTi=AH1otmk-g+mTy69MGXNE9-G8+ERXT3PuXNSs1@mail.gmail.com>
Message-ID: <4CD776C2.3070601@oracle.com>

The incremental mode of CMS was implemented for running CMS on machines
with a single hardware thread.  The idea was that CMS does work concurrently
with the application but with a single hardware thread, when CMS ran in
a concurrent phase, it would still be stopping the application (CMS would
use the single hardware thread and there would be nothing for the 
application
to run on).  So the incremental mode of CMS does the concurrent phases
in increments and gives up the hardware thread to the application on
a regular basis.  On a 4 processor machine I would not recommend the 
incremental
mode.    If you have not tried the regular CMS (no incremental mode),
I recommend that you try it.  Overall it is more efficient.

On 11/6/10 11:32 PM, Dori Rabin wrote:
> Thanks fo your replies...
> I am  running on jdk1.6.0_04 and OS is Linux 2.4.21-37.ELsmp, I have 4 
> processors on the machine
> About sharing the log file it might be a little problematic now so 
> let's discuss it if no other option
> I also thought of getting rid of the incremental mode but I am afraid 
> of the effect of long pauses on our realtime application due to the 
> stop the world phase of CMS....
> I didn't quiet understand what exactly you meant by "scale back the # 
> parallel threads used by the CMS..." is there a parameter I need to 
> set for that ? if yes what should be its value ?
> Thanks
> Dori
> On Thu, Nov 4, 2010 at 6:51 PM, Y. S. Ramakrishna 
> <y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>> wrote:
>
>     Hi Dori --
>
>     What's the version of JDK you are running? Can you share a
>     complete log?
>     It appears as though the iCMS "auto-pacing" is, for some reason, not
>     kicking in "soon enough"; one workaround is to use turn off
>     auto-pacing
>     and use an explicit duty-cycle (which has its own disadvantages).
>
>     I'd suggest filing a bug, and including a complete log file showing
>     the problem.
>
>     thanks.
>     -- ramki
>
>     On 11/04/10 06:27, Rabin Dori wrote:
>     > Hi,
>     >
>     > Once in a while and for a reason I cannot understand the CMS
>     kicks up
>     > too late which causes a promotion failure and full GC that takes
>     very
>     > long (more than 2 minutes which causes other problems)?
>     >
>     > My question is how to tune the gc flags in order to make sure
>     that the
>     > concurrent sweep will always occur in parallel (incremental mode)
>     > without long pause stop the world but also without reaching its
>     maximum
>     > capacity ?
>     >
>     >
>     >
>     > (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is
>     > ignored because of the CMSIncrementalMode
>     >
>     > And from looking in the log file we can see that the old generation
>     > reaches size of 835?959K out of 843?000K at the time the concurrent
>     > failure (I marked this line in red)
>     >
>     >
>     >
>     > *_I am running the jvm with the following parameters :_*
>     >
>     > wrapper.java.additional.4=-XX:NewSize=200m
>     >
>     > wrapper.java.additional.5=-XX:SurvivorRatio=6
>     >
>     > wrapper.java.additional.6=-XX:MaxTenuringThreshold=4
>     >
>     > wrapper.java.additional.7=-XX:+CMSIncrementalMode
>     >
>     > wrapper.java.additional.8=-XX:+CMSIncrementalPacing
>     >
>     > wrapper.java.additional.9=-XX:+DisableExplicitGC
>     >
>     > wrapper.java.additional.10=-XX:+UseConcMarkSweepGC
>     >
>     > wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled
>     >
>     > wrapper.java.additional.12=-XX:+PrintGCDetails
>     >
>     > wrapper.java.additional.13=-XX:+PrintGCTimeStamps
>     >
>     > wrapper.java.additional.14=-XX:-TraceClassUnloading
>     >
>     > wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError
>     >
>     > wrapper.java.additional.16=-verbose:gc
>     >
>     > wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60
>     >
>     > wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly
>     >
>     > wrapper.java.additional.19=-XX:+PrintTenuringDistribution
>     >
>     >
>     >
>     >
>     >
>     > *_Extracts from the log file:_*
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | Desired survivor size
>     13107200
>     > bytes, new threshold 4 (max 4)
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   1:     544000
>     > bytes,     544000 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   2:     346320
>     > bytes,     890320 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   3:     262800
>     > bytes,    1153120 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   4:     238528
>     > bytes,    1391648 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | : 155621K->2065K(179200K),
>     > 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500
>     secs]
>     > [Times: user=0.00 sys=0.00, real=0.11 secs]
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | 422097.583: [GC
>     422097.583:
>     > [ParNew
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:55:54 |
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | Desired survivor size
>     13107200
>     > bytes, new threshold 4 (max 4)
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   1:     577104
>     > bytes,     577104 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   2:     261856
>     > bytes,     838960 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   3:     298832
>     > bytes,    1137792 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   4:     259176
>     > bytes,    1396968 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | : 155664K->2386K(179200K),
>     > 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370
>     secs]
>     > [Times: user=0.00 sys=0.00, real=0.05 secs]
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:57:27 | 422190.993: [GC
>     422190.993:
>     > [ParNew
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:57:28 |
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | Desired survivor size
>     13107200
>     > bytes, new threshold 4 (max 4)
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   1:     676656
>     > bytes,     676656 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   2:     283376
>     > bytes,     960032 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   3:     239472
>     > bytes,    1199504 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   4:     264960
>     > bytes,    1464464 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | : 155986K->1918K(179200K),
>     > 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200
>     secs]
>     > [Times: user=0.01 sys=0.00, real=0.07 secs]
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | 422277.406: [GC
>     422277.406:
>     > [ParNew
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | Desired survivor size
>     13107200
>     > bytes, new threshold 4 (max 4)
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   1:     615944
>     > bytes,     615944 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   2:     334120
>     > bytes,     950064 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   3:     276736
>     > bytes,    1226800 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   4:     236424
>     > bytes,    1463224 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | : 155518K->1928K(179200K),
>     > 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920
>     secs]
>     > [Times: user=0.00 sys=0.00, real=0.04 secs]
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | 422366.439: [GC
>     422366.439:
>     > [ParNew
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:00:23 |  (promotion failed)
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | Desired survivor size
>     13107200
>     > bytes, new threshold 4 (max 4)
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   1:     574000
>     > bytes,     574000 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   2:     315432
>     > bytes,     889432 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   3:     281216
>     > bytes,    1170648 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   4:     271776
>     > bytes,    1442424 total
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | :
>     155528K->155689K(179200K),
>     > 0.1007840 secs]422366.540: [CMS
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>     > sun.reflect.GeneratedMethodAccessor121]
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>     > sun.reflect.GeneratedMethodAccessor119]
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>     > sun.reflect.GeneratedMethodAccessor124]
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>     > sun.reflect.GeneratedMethodAccessor123]
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>     > sun.reflect.GeneratedMethodAccessor120]
>     >
>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>     > sun.reflect.GeneratedMethodAccessor122]
>     >
>     > ERROR  | wrapper  | 2010/11/02 05:02:37 | JVM appears hung:
>     Timed out
>     > waiting for signal from JVM.
>     >
>     >
>     >
>     >
>     >
>     > *Dori Rabin*
>     >
>     > *cid:image001.gif at 01CB69E7.E5E45760*
>     >
>     >
>     >
>     > *cid:image002.jpg at 01CB69E7.E5E45760*
>     >
>     > T. +972-3-123-4567   F. +972-3- 766-3559   M. +972-54- 4232-706
>     >
>     > Email: mailto:Dori <mailto:Dori> <mailto:your-email-address
>     <mailto:your-email-address>>.Rabin at starhome.com
>     <mailto:Rabin at starhome.com>
>     >
>     >
>     >
>     >
>     >
>     > *cid:image003.gif at 01CB69E7.E5E45760* <http://www.starhome.com/>
>     > *cid:image004.gif at 01CB69E7.E5E45760*
>     > <http://blog.starhome.com/>   *cid:image005.gif at 01CB69E7.E5E45760*
>     > <http://bit.ly/9SbzNs>   *cid:image006.gif at 01CB69E7.E5E45760*
>     > <http://bit.ly/aoU2m3>   *cid:image007.gif at 01CB69E7.E5E45760*
>     > <http://linkd.in/bjscKL>
>     >
>     > This email contains proprietary and/or confidential information of
>     > Starhome. If you
>     >
>     > have received this email in error, please delete all copies without
>     > delay and do not
>     >
>     > copy, distribute, or rely on any information contained in this
>     email.
>     > Thank you!
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     ------------------------------------------------------------------------
>     >
>     > _______________________________________________
>     > hotspot-gc-use mailing list
>     > hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101107/f84e83a9/attachment.html 

From y.s.ramakrishna at oracle.com  Mon Nov  8 09:11:56 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Mon, 08 Nov 2010 09:11:56 -0800
Subject: problem in gc with incremental mode
In-Reply-To: <4CD776C2.3070601@oracle.com>
References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local>	<4CD2E49B.1080008@oracle.com>
	<AANLkTi=AH1otmk-g+mTy69MGXNE9-G8+ERXT3PuXNSs1@mail.gmail.com>
	<4CD776C2.3070601@oracle.com>
Message-ID: <4CD82F5C.20806@oracle.com>

I agree completely with Jon's recommendation. As regards, your question:-

>>> I didn't quiet understand what exactly you meant by "scale back the # 
>>> parallel threads used by the CMS..." is there a parameter I need to 
>>> set for that ? if yes what should be its value ?

On a 4-processor machine, you get a single concurrent marking thread, so
you do not need to do anything specific since you are already at the minimum
# concurrent worker threads used by CMS.

-- ramki

On 11/07/10 20:04, Jon Masamitsu wrote:
> The incremental mode of CMS was implemented for running CMS on machines
> with a single hardware thread.  The idea was that CMS does work concurrently
> with the application but with a single hardware thread, when CMS ran in
> a concurrent phase, it would still be stopping the application (CMS would
> use the single hardware thread and there would be nothing for the 
> application
> to run on).  So the incremental mode of CMS does the concurrent phases
> in increments and gives up the hardware thread to the application on
> a regular basis.  On a 4 processor machine I would not recommend the 
> incremental
> mode.    If you have not tried the regular CMS (no incremental mode),
> I recommend that you try it.  Overall it is more efficient.
> 
> On 11/6/10 11:32 PM, Dori Rabin wrote:
>> Thanks fo your replies...
>> I am  running on jdk1.6.0_04 and OS is Linux 2.4.21-37.ELsmp, I have 4 
>> processors on the machine
>> About sharing the log file it might be a little problematic now so 
>> let's discuss it if no other option 
>> I also thought of getting rid of the incremental mode but I am afraid 
>> of the effect of long pauses on our realtime application due to the 
>> stop the world phase of CMS....
>> I didn't quiet understand what exactly you meant by "scale back the # 
>> parallel threads used by the CMS..." is there a parameter I need to 
>> set for that ? if yes what should be its value ?
>> Thanks
>> Dori
>>  
>>  
>> On Thu, Nov 4, 2010 at 6:51 PM, Y. S. Ramakrishna 
>> <y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>> wrote:
>>
>>     Hi Dori --
>>
>>     What's the version of JDK you are running? Can you share a
>>     complete log?
>>     It appears as though the iCMS "auto-pacing" is, for some reason, not
>>     kicking in "soon enough"; one workaround is to use turn off
>>     auto-pacing
>>     and use an explicit duty-cycle (which has its own disadvantages).
>>
>>     I'd suggest filing a bug, and including a complete log file showing
>>     the problem.
>>
>>     thanks.
>>     -- ramki
>>
>>     On 11/04/10 06:27, Rabin Dori wrote:
>>     > Hi,
>>     >
>>     > Once in a while and for a reason I cannot understand the CMS
>>     kicks up
>>     > too late which causes a promotion failure and full GC that takes
>>     very
>>     > long (more than 2 minutes which causes other problems)?
>>     >
>>     > My question is how to tune the gc flags in order to make sure
>>     that the
>>     > concurrent sweep will always occur in parallel (incremental mode)
>>     > without long pause stop the world but also without reaching its
>>     maximum
>>     > capacity ?
>>     >
>>     >
>>     >
>>     > (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is
>>     > ignored because of the CMSIncrementalMode
>>     >
>>     > And from looking in the log file we can see that the old generation
>>     > reaches size of 835?959K out of 843?000K at the time the concurrent
>>     > failure (I marked this line in red)
>>     >
>>     >
>>     >
>>     > *_I am running the jvm with the following parameters :_*
>>     >
>>     > wrapper.java.additional.4=-XX:NewSize=200m
>>     >
>>     > wrapper.java.additional.5=-XX:SurvivorRatio=6
>>     >
>>     > wrapper.java.additional.6=-XX:MaxTenuringThreshold=4
>>     >
>>     > wrapper.java.additional.7=-XX:+CMSIncrementalMode
>>     >
>>     > wrapper.java.additional.8=-XX:+CMSIncrementalPacing
>>     >
>>     > wrapper.java.additional.9=-XX:+DisableExplicitGC
>>     >
>>     > wrapper.java.additional.10=-XX:+UseConcMarkSweepGC
>>     >
>>     > wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled
>>     >
>>     > wrapper.java.additional.12=-XX:+PrintGCDetails
>>     >
>>     > wrapper.java.additional.13=-XX:+PrintGCTimeStamps
>>     >
>>     > wrapper.java.additional.14=-XX:-TraceClassUnloading
>>     >
>>     > wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError
>>     >
>>     > wrapper.java.additional.16=-verbose:gc
>>     >
>>     > wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60
>>     >
>>     > wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly
>>     >
>>     > wrapper.java.additional.19=-XX:+PrintTenuringDistribution
>>     >
>>     >
>>     >
>>     >
>>     >
>>     > *_Extracts from the log file:_*
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | Desired survivor size
>>     13107200
>>     > bytes, new threshold 4 (max 4)
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   1:     544000
>>     > bytes,     544000 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   2:     346320
>>     > bytes,     890320 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   3:     262800
>>     > bytes,    1153120 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   4:     238528
>>     > bytes,    1391648 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:54:33 | : 155621K->2065K(179200K),
>>     > 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500
>>     secs]
>>     > [Times: user=0.00 sys=0.00, real=0.11 secs]
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | 422097.583: [GC
>>     422097.583:
>>     > [ParNew
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:55:54 |
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | Desired survivor size
>>     13107200
>>     > bytes, new threshold 4 (max 4)
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   1:     577104
>>     > bytes,     577104 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   2:     261856
>>     > bytes,     838960 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   3:     298832
>>     > bytes,    1137792 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   4:     259176
>>     > bytes,    1396968 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:55:54 | : 155664K->2386K(179200K),
>>     > 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370
>>     secs]
>>     > [Times: user=0.00 sys=0.00, real=0.05 secs]
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:57:27 | 422190.993: [GC
>>     422190.993:
>>     > [ParNew
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:57:28 |
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | Desired survivor size
>>     13107200
>>     > bytes, new threshold 4 (max 4)
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   1:     676656
>>     > bytes,     676656 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   2:     283376
>>     > bytes,     960032 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   3:     239472
>>     > bytes,    1199504 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   4:     264960
>>     > bytes,    1464464 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:57:28 | : 155986K->1918K(179200K),
>>     > 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200
>>     secs]
>>     > [Times: user=0.01 sys=0.00, real=0.07 secs]
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | 422277.406: [GC
>>     422277.406:
>>     > [ParNew
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | Desired survivor size
>>     13107200
>>     > bytes, new threshold 4 (max 4)
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   1:     615944
>>     > bytes,     615944 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   2:     334120
>>     > bytes,     950064 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   3:     276736
>>     > bytes,    1226800 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   4:     236424
>>     > bytes,    1463224 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 04:58:54 | : 155518K->1928K(179200K),
>>     > 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920
>>     secs]
>>     > [Times: user=0.00 sys=0.00, real=0.04 secs]
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | 422366.439: [GC
>>     422366.439:
>>     > [ParNew
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:00:23 |  (promotion failed)
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | Desired survivor size
>>     13107200
>>     > bytes, new threshold 4 (max 4)
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   1:     574000
>>     > bytes,     574000 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   2:     315432
>>     > bytes,     889432 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   3:     281216
>>     > bytes,    1170648 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   4:     271776
>>     > bytes,    1442424 total
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:00:23 | :
>>     155528K->155689K(179200K),
>>     > 0.1007840 secs]422366.540: [CMS
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>     > sun.reflect.GeneratedMethodAccessor121]
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>     > sun.reflect.GeneratedMethodAccessor119]
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>     > sun.reflect.GeneratedMethodAccessor124]
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>     > sun.reflect.GeneratedMethodAccessor123]
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>     > sun.reflect.GeneratedMethodAccessor120]
>>     >
>>     > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>     > sun.reflect.GeneratedMethodAccessor122]
>>     >
>>     > ERROR  | wrapper  | 2010/11/02 05:02:37 | JVM appears hung:
>>     Timed out
>>     > waiting for signal from JVM.
>>     >
>>     >
>>     >
>>     >
>>     >
>>     > *Dori Rabin*
>>     >
>>     > *cid:image001.gif at 01CB69E7.E5E45760*
>>     >
>>     >
>>     >
>>     > *cid:image002.jpg at 01CB69E7.E5E45760*
>>     >
>>     > T. +972-3-123-4567   F. +972-3- 766-3559   M. +972-54- 4232-706
>>     >
>>     > Email: mailto:Dori <mailto:Dori> <mailto:your-email-address
>>     <mailto:your-email-address>>.Rabin at starhome.com
>>     <mailto:Rabin at starhome.com>
>>     >
>>     >
>>     >
>>     >
>>     >
>>     > *cid:image003.gif at 01CB69E7.E5E45760* <http://www.starhome.com/>
>>     > *cid:image004.gif at 01CB69E7.E5E45760*
>>     > <http://blog.starhome.com/>   *cid:image005.gif at 01CB69E7.E5E45760*
>>     > <http://bit.ly/9SbzNs>   *cid:image006.gif at 01CB69E7.E5E45760*
>>     > <http://bit.ly/aoU2m3>   *cid:image007.gif at 01CB69E7.E5E45760*
>>     > <http://linkd.in/bjscKL>
>>     >
>>     > This email contains proprietary and/or confidential information of
>>     > Starhome. If you
>>     >
>>     > have received this email in error, please delete all copies without
>>     > delay and do not
>>     >
>>     > copy, distribute, or rely on any information contained in this
>>     email.
>>     > Thank you!
>>     >
>>     >
>>     >
>>     >
>>     >
>>     >
>>     >
>>     >
>>     >
>>     ------------------------------------------------------------------------
>>     >
>>     > _______________________________________________
>>     > hotspot-gc-use mailing list
>>     > hotspot-gc-use at openjdk.java.net
>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>     > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>     _______________________________________________
>>     hotspot-gc-use mailing list
>>     hotspot-gc-use at openjdk.java.net
>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
> 

From darragh.curran at gmail.com  Wed Nov 10 07:47:01 2010
From: darragh.curran at gmail.com (Darragh Curran)
Date: Wed, 10 Nov 2010 15:47:01 +0000
Subject: Full GC run freeing no memory despite many unreachable objects
Message-ID: <AANLkTinAd7-1fidin95--yN_qjescxwMj_b3K4SO24gE@mail.gmail.com>

Hi,

I hope someone can help me understand this better.

I'm running java version "Java(TM) SE Runtime Environment (build
1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)"

It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m'

Every few weeks a host becomes unavailable for requests. When I look
into it, the java process is using 100% CPU, seems to have no running
threads when I do multiple stack dumps, and based on jstat output
appears to spend all it's time doing full gc runs:

14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1107  5656.903 6945.644
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686

I used jmap to dump the heap for analysis. When I analyse it I see
that 94% of objects are unreachable, but not yet collected.

>From jstat it appears that gc is running roughly every 10 seconds and
lasting approx 10 seconds, but fails to free any memory.

I'd really appreciate some advise on how to better understand my
problem and what to do to try and fix it.

Thanks,
Darragh

From jon.masamitsu at oracle.com  Wed Nov 10 09:03:26 2010
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 10 Nov 2010 09:03:26 -0800
Subject: Full GC run freeing no memory despite many unreachable objects
In-Reply-To: <AANLkTinAd7-1fidin95--yN_qjescxwMj_b3K4SO24gE@mail.gmail.com>
References: <AANLkTinAd7-1fidin95--yN_qjescxwMj_b3K4SO24gE@mail.gmail.com>
Message-ID: <4CDAD05E.7080602@oracle.com>

  I assume that the Java process does not recover on it's
own and has to be killed?

What are the column heading for your jstat output?

Turn on GC logging (if you don't already have it on) and check to
see if your perm gen is full.

If your perm gen is not full but the Java heap appears to be
full, then the garbage collector just thinks that all that data
is live.  You used jmap.  Do you know what's filling up
the heap?


On 11/10/2010 7:47 AM, Darragh Curran wrote:
> Hi,
>
> I hope someone can help me understand this better.
>
> I'm running java version "Java(TM) SE Runtime Environment (build
> 1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)"
>
> It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m'
>
> Every few weeks a host becomes unavailable for requests. When I look
> into it, the java process is using 100% CPU, seems to have no running
> threads when I do multiple stack dumps, and based on jstat output
> appears to spend all it's time doing full gc runs:
>
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1107  5656.903 6945.644
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>
> I used jmap to dump the heap for analysis. When I analyse it I see
> that 94% of objects are unreachable, but not yet collected.
>
> > From jstat it appears that gc is running roughly every 10 seconds and
> lasting approx 10 seconds, but fails to free any memory.
>
> I'd really appreciate some advise on how to better understand my
> problem and what to do to try and fix it.
>
> Thanks,
> Darragh
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From darragh.curran at gmail.com  Wed Nov 10 09:22:05 2010
From: darragh.curran at gmail.com (Darragh Curran)
Date: Wed, 10 Nov 2010 17:22:05 +0000
Subject: Full GC run freeing no memory despite many unreachable objects
In-Reply-To: <4CDAD05E.7080602@oracle.com>
References: <AANLkTinAd7-1fidin95--yN_qjescxwMj_b3K4SO24gE@mail.gmail.com>
	<4CDAD05E.7080602@oracle.com>
Message-ID: <AANLkTi=F_H9w85OVE+Zr9FN=tg_eRhtEBpHSxHNq+sac@mail.gmail.com>

Ooops, Here's the jstat output with headings:

S0C    S1C    S0U    S1U      EC       EU        OC         OU
PC     PU    YGC     YGCT    FGC    FGCT     GCT
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
...
...
...
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1107  5656.903 6945.644
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1107  5656.903 6945.644
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686

It looks like the perm gen is full (~99%)

The process doesn't recover on its own. Here's some other gc output
from our logs:


Heap
 PSYoungGen      total 531712K, used 517887K [0x92d80000, 0xb42d0000,
0xb42d0000)
  eden space 517888K, 99% used [0x92d80000,0xb273fff8,0xb2740000)
  from space 13824K, 0% used [0xb2740000,0xb2740000,0xb34c0000)
  to   space 14400K, 0% used [0xb34c0000,0xb34c0000,0xb42d0000)
 PSOldGen        total 1092288K, used 1060042K [0x502d0000,
0x92d80000, 0x92d80000)
  object space 1092288K, 97% used [0x502d0000,0x90e02a20,0x92d80000)
 PSPermGen       total 37376K, used 37186K [0x4c2d0000, 0x4e750000, 0x502d0000)
  object space 37376K, 99% used [0x4c2d0000,0x4e720b60,0x4e750000)

Nov 9, 2010 9:29:48 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop
executeAcceptLoop
WARNING: RMI TCP Accept-0: accept loop for
ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=36857] throws
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:116)
        at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:34)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.tcp.TCPTransport$1.newThread(TCPTransport.java:94)
        at java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:672)
        at java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:721)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657)
        at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:384)
        at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
        at java.lang.Thread.run(Thread.java:619)

>From jmap I can see that most of the unreachable objects are
LinkedHashMap.Entry objects that I know are created as part of many
requests.

Darragh

On Wed, Nov 10, 2010 at 5:03 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
> ?I assume that the Java process does not recover on it's
> own and has to be killed?
>
> What are the column heading for your jstat output?
>
> Turn on GC logging (if you don't already have it on) and check to
> see if your perm gen is full.
>
> If your perm gen is not full but the Java heap appears to be
> full, then the garbage collector just thinks that all that data
> is live. ?You used jmap. ?Do you know what's filling up
> the heap?
>
>
>
> On 11/10/2010 7:47 AM, Darragh Curran wrote:
>> Hi,
>>
>> I hope someone can help me understand this better.
>>
>> I'm running java version "Java(TM) SE Runtime Environment (build
>> 1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)"
>>
>> It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m'
>>
>> Every few weeks a host becomes unavailable for requests. When I look
>> into it, the java process is using 100% CPU, seems to have no running
>> threads when I do multiple stack dumps, and based on jstat output
>> appears to spend all it's time doing full gc runs:
>>
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1107 ?5656.903 6945.644
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>>
>> I used jmap to dump the heap for analysis. When I analyse it I see
>> that 94% of objects are unreachable, but not yet collected.
>>
>> > From jstat it appears that gc is running roughly every 10 seconds and
>> lasting approx 10 seconds, but fails to free any memory.
>>
>> I'd really appreciate some advise on how to better understand my
>> problem and what to do to try and fix it.
>>
>> Thanks,
>> Darragh
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

From y.s.ramakrishna at oracle.com  Wed Nov 10 10:03:56 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Wed, 10 Nov 2010 10:03:56 -0800
Subject: Full GC run freeing no memory despite many unreachable objects
In-Reply-To: <AANLkTi=F_H9w85OVE+Zr9FN=tg_eRhtEBpHSxHNq+sac@mail.gmail.com>
References: <AANLkTinAd7-1fidin95--yN_qjescxwMj_b3K4SO24gE@mail.gmail.com>	<4CDAD05E.7080602@oracle.com>
	<AANLkTi=F_H9w85OVE+Zr9FN=tg_eRhtEBpHSxHNq+sac@mail.gmail.com>
Message-ID: <4CDADE8C.6000503@oracle.com>


Have you tried say doubling the size of your heap (old gen and
perm gen) to see what happens to the problem? May be you are using a
heap that is much too small for your application's natural footprint on
the platform on which you are running it? If you believe the process
heap needs should not be so large, check what the heap contains --
does a jmap -histo:live show many objects as live that you believe
should have been collected? Have you tried to determine, using a
tool such as jhat on these jmap heap dumps, to see if your application
has a real leak after all?

If you have determined that you do not have a leak, can you share
a test case? (Also please first try the latest 6u23 JDK before
you try to design a test case.)

best regards.
-- ramki


On 11/10/10 09:22, Darragh Curran wrote:
> Ooops, Here's the jstat output with headings:
> 
> S0C    S1C    S0U    S1U      EC       EU        OC         OU
> PC     PU    YGC     YGCT    FGC    FGCT     GCT
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
> ...
> ...
> ...
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1107  5656.903 6945.644
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1107  5656.903 6945.644
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
> 
> It looks like the perm gen is full (~99%)
> 
> The process doesn't recover on its own. Here's some other gc output
> from our logs:
> 
> 
> Heap
>  PSYoungGen      total 531712K, used 517887K [0x92d80000, 0xb42d0000,
> 0xb42d0000)
>   eden space 517888K, 99% used [0x92d80000,0xb273fff8,0xb2740000)
>   from space 13824K, 0% used [0xb2740000,0xb2740000,0xb34c0000)
>   to   space 14400K, 0% used [0xb34c0000,0xb34c0000,0xb42d0000)
>  PSOldGen        total 1092288K, used 1060042K [0x502d0000,
> 0x92d80000, 0x92d80000)
>   object space 1092288K, 97% used [0x502d0000,0x90e02a20,0x92d80000)
>  PSPermGen       total 37376K, used 37186K [0x4c2d0000, 0x4e750000, 0x502d0000)
>   object space 37376K, 99% used [0x4c2d0000,0x4e720b60,0x4e750000)
> 
> Nov 9, 2010 9:29:48 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop
> executeAcceptLoop
> WARNING: RMI TCP Accept-0: accept loop for
> ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=36857] throws
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:116)
>         at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:34)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at sun.rmi.transport.tcp.TCPTransport$1.newThread(TCPTransport.java:94)
>         at java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:672)
>         at java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:721)
>         at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657)
>         at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:384)
>         at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
>         at java.lang.Thread.run(Thread.java:619)
> 
>>From jmap I can see that most of the unreachable objects are
> LinkedHashMap.Entry objects that I know are created as part of many
> requests.
> 
> Darragh
> 
> On Wed, Nov 10, 2010 at 5:03 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
>>  I assume that the Java process does not recover on it's
>> own and has to be killed?
>>
>> What are the column heading for your jstat output?
>>
>> Turn on GC logging (if you don't already have it on) and check to
>> see if your perm gen is full.
>>
>> If your perm gen is not full but the Java heap appears to be
>> full, then the garbage collector just thinks that all that data
>> is live.  You used jmap.  Do you know what's filling up
>> the heap?
>>
>>
>>
>> On 11/10/2010 7:47 AM, Darragh Curran wrote:
>>> Hi,
>>>
>>> I hope someone can help me understand this better.
>>>
>>> I'm running java version "Java(TM) SE Runtime Environment (build
>>> 1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)"
>>>
>>> It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m'
>>>
>>> Every few weeks a host becomes unavailable for requests. When I look
>>> into it, the java process is using 100% CPU, seems to have no running
>>> threads when I do multiple stack dumps, and based on jstat output
>>> appears to spend all it's time doing full gc runs:
>>>
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1107  5656.903 6945.644
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>>
>>> I used jmap to dump the heap for analysis. When I analyse it I see
>>> that 94% of objects are unreachable, but not yet collected.
>>>
>>>> From jstat it appears that gc is running roughly every 10 seconds and
>>> lasting approx 10 seconds, but fails to free any memory.
>>>
>>> I'd really appreciate some advise on how to better understand my
>>> problem and what to do to try and fix it.
>>>
>>> Thanks,
>>> Darragh
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From darragh.curran at gmail.com  Thu Nov 11 03:50:55 2010
From: darragh.curran at gmail.com (Darragh Curran)
Date: Thu, 11 Nov 2010 11:50:55 +0000
Subject: Full GC run freeing no memory despite many unreachable objects
In-Reply-To: <4CDADE8C.6000503@oracle.com>
References: <AANLkTinAd7-1fidin95--yN_qjescxwMj_b3K4SO24gE@mail.gmail.com>
	<4CDAD05E.7080602@oracle.com>
	<AANLkTi=F_H9w85OVE+Zr9FN=tg_eRhtEBpHSxHNq+sac@mail.gmail.com>
	<4CDADE8C.6000503@oracle.com>
Message-ID: <AANLkTinrqB8jXOgAvwhHBuqsSz0zX-W59DF545u0CWQ5@mail.gmail.com>

Thanks Ramki,

I know what I really need to do is find how to consistently reproduce
this, then look at ways to either fix a problem with our code, tune gc
settings or (least likely) discover a bug in hotspot.

Before I do that, I was wondering if anyone had insight into what
would cause the expensive full gc runs consuming 100% CPU and lasting
for ~10 seconds without freeing any memory, despite there being many
unreachable objects (according to the heap dump).

Is there a cap on the length of a full GC run? Perhaps it's taking all
it's time to traverse the heap and gets cancelled before it does any
collecting?

Best regards,
Darragh


On Wed, Nov 10, 2010 at 6:03 PM, Y. S. Ramakrishna
<y.s.ramakrishna at oracle.com> wrote:
>
> Have you tried say doubling the size of your heap (old gen and
> perm gen) to see what happens to the problem? May be you are using a
> heap that is much too small for your application's natural footprint on
> the platform on which you are running it? If you believe the process
> heap needs should not be so large, check what the heap contains --
> does a jmap -histo:live show many objects as live that you believe
> should have been collected? Have you tried to determine, using a
> tool such as jhat on these jmap heap dumps, to see if your application
> has a real leak after all?
>
> If you have determined that you do not have a leak, can you share
> a test case? (Also please first try the latest 6u23 JDK before
> you try to design a test case.)
>
> best regards.
> -- ramki
>
>
> On 11/10/10 09:22, Darragh Curran wrote:
>>
>> Ooops, Here's the jstat output with headings:
>>
>> S0C ? ?S1C ? ?S0U ? ?S1U ? ? ?EC ? ? ? EU ? ? ? ?OC ? ? ? ? OU
>> PC ? ? PU ? ?YGC ? ? YGCT ? ?FGC ? ?FGCT ? ? GCT
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9
>> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9
>> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9
>> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9
>> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9
>> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9
>> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9
>> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060013.9
>> 37376.0 37182.0 ?38654 1288.741 1105 ?5636.779 6925.520
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0
>> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0
>> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0
>> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0
>> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0
>> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0
>> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0
>> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0
>> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0
>> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.0
>> 37376.0 37182.0 ?38654 1288.741 1106 ?5646.857 6935.598
>> ...
>> ...
>> ...
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1107 ?5656.903 6945.644
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1107 ?5656.903 6945.644
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>>
>> It looks like the perm gen is full (~99%)
>>
>> The process doesn't recover on its own. Here's some other gc output
>> from our logs:
>>
>>
>> Heap
>> ?PSYoungGen ? ? ?total 531712K, used 517887K [0x92d80000, 0xb42d0000,
>> 0xb42d0000)
>> ?eden space 517888K, 99% used [0x92d80000,0xb273fff8,0xb2740000)
>> ?from space 13824K, 0% used [0xb2740000,0xb2740000,0xb34c0000)
>> ?to ? space 14400K, 0% used [0xb34c0000,0xb34c0000,0xb42d0000)
>> ?PSOldGen ? ? ? ?total 1092288K, used 1060042K [0x502d0000,
>> 0x92d80000, 0x92d80000)
>> ?object space 1092288K, 97% used [0x502d0000,0x90e02a20,0x92d80000)
>> ?PSPermGen ? ? ? total 37376K, used 37186K [0x4c2d0000, 0x4e750000,
>> 0x502d0000)
>> ?object space 37376K, 99% used [0x4c2d0000,0x4e720b60,0x4e750000)
>>
>> Nov 9, 2010 9:29:48 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop
>> executeAcceptLoop
>> WARNING: RMI TCP Accept-0: accept loop for
>> ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=36857] throws
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>> ? ? ? ?at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:116)
>> ? ? ? ?at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:34)
>> ? ? ? ?at java.security.AccessController.doPrivileged(Native Method)
>> ? ? ? ?at
>> sun.rmi.transport.tcp.TCPTransport$1.newThread(TCPTransport.java:94)
>> ? ? ? ?at
>> java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:672)
>> ? ? ? ?at
>> java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:721)
>> ? ? ? ?at
>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657)
>> ? ? ? ?at
>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:384)
>> ? ? ? ?at
>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
>> ? ? ? ?at java.lang.Thread.run(Thread.java:619)
>>
>>> From jmap I can see that most of the unreachable objects are
>>
>> LinkedHashMap.Entry objects that I know are created as part of many
>> requests.
>>
>> Darragh
>>
>> On Wed, Nov 10, 2010 at 5:03 PM, Jon Masamitsu <jon.masamitsu at oracle.com>
>> wrote:
>>>
>>> ?I assume that the Java process does not recover on it's
>>> own and has to be killed?
>>>
>>> What are the column heading for your jstat output?
>>>
>>> Turn on GC logging (if you don't already have it on) and check to
>>> see if your perm gen is full.
>>>
>>> If your perm gen is not full but the Java heap appears to be
>>> full, then the garbage collector just thinks that all that data
>>> is live. ?You used jmap. ?Do you know what's filling up
>>> the heap?
>>>
>>>
>>>
>>> On 11/10/2010 7:47 AM, Darragh Curran wrote:
>>>>
>>>> Hi,
>>>>
>>>> I hope someone can help me understand this better.
>>>>
>>>> I'm running java version "Java(TM) SE Runtime Environment (build
>>>> 1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)"
>>>>
>>>> It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m'
>>>>
>>>> Every few weeks a host becomes unavailable for requests. When I look
>>>> into it, the java process is using 100% CPU, seems to have no running
>>>> threads when I do multiple stack dumps, and based on jstat output
>>>> appears to spend all it's time doing full gc runs:
>>>>
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1107 ?5656.903 6945.644
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1108 ?5666.547 6955.287
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>>>> 14400.0 13824.0 ?0.0 ? ?0.0 ? 517888.0 517888.0 1092288.0 ?1060014.1
>>>> 37376.0 37182.0 ?38654 1288.741 1109 ?5675.945 6964.686
>>>>
>>>> I used jmap to dump the heap for analysis. When I analyse it I see
>>>> that 94% of objects are unreachable, but not yet collected.
>>>>
>>>>> From jstat it appears that gc is running roughly every 10 seconds and
>>>>
>>>> lasting approx 10 seconds, but fails to free any memory.
>>>>
>>>> I'd really appreciate some advise on how to better understand my
>>>> problem and what to do to try and fix it.
>>>>
>>>> Thanks,
>>>> Darragh
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

From jon.masamitsu at oracle.com  Thu Nov 11 06:48:09 2010
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Thu, 11 Nov 2010 06:48:09 -0800
Subject: Full GC run freeing no memory despite many unreachable objects
In-Reply-To: <AANLkTinrqB8jXOgAvwhHBuqsSz0zX-W59DF545u0CWQ5@mail.gmail.com>
References: <AANLkTinAd7-1fidin95--yN_qjescxwMj_b3K4SO24gE@mail.gmail.com>
	<4CDAD05E.7080602@oracle.com>
	<AANLkTi=F_H9w85OVE+Zr9FN=tg_eRhtEBpHSxHNq+sac@mail.gmail.com>
	<4CDADE8C.6000503@oracle.com>
	<AANLkTinrqB8jXOgAvwhHBuqsSz0zX-W59DF545u0CWQ5@mail.gmail.com>
Message-ID: <4CDC0229.9040600@oracle.com>

Darragh,

The GC thinks that all the data in the heap is live
so that is why no space is being recovered
at a full GC.  There is no time limit on the collections
(for this collector).  The full collection does complete.
I don't know why the tools you've used say the objects
are unreachable.  My tendency is, of course, to believe
the GC before the tools :-).

Jon

On 11/11/10 03:50, Darragh Curran wrote:
> Thanks Ramki,
>
> I know what I really need to do is find how to consistently reproduce
> this, then look at ways to either fix a problem with our code, tune gc
> settings or (least likely) discover a bug in hotspot.
>
> Before I do that, I was wondering if anyone had insight into what
> would cause the expensive full gc runs consuming 100% CPU and lasting
> for ~10 seconds without freeing any memory, despite there being many
> unreachable objects (according to the heap dump).
>
> Is there a cap on the length of a full GC run? Perhaps it's taking all
> it's time to traverse the heap and gets cancelled before it does any
> collecting?
>
> Best regards,
> Darragh
>
>
> On Wed, Nov 10, 2010 at 6:03 PM, Y. S. Ramakrishna
> <y.s.ramakrishna at oracle.com> wrote:
>   
>> Have you tried say doubling the size of your heap (old gen and
>> perm gen) to see what happens to the problem? May be you are using a
>> heap that is much too small for your application's natural footprint on
>> the platform on which you are running it? If you believe the process
>> heap needs should not be so large, check what the heap contains --
>> does a jmap -histo:live show many objects as live that you believe
>> should have been collected? Have you tried to determine, using a
>> tool such as jhat on these jmap heap dumps, to see if your application
>> has a real leak after all?
>>
>> If you have determined that you do not have a leak, can you share
>> a test case? (Also please first try the latest 6u23 JDK before
>> you try to design a test case.)
>>
>> best regards.
>> -- ramki
>>
>>
>> On 11/10/10 09:22, Darragh Curran wrote:
>>     
>>> Ooops, Here's the jstat output with headings:
>>>
>>> S0C    S1C    S0U    S1U      EC       EU        OC         OU
>>> PC     PU    YGC     YGCT    FGC    FGCT     GCT
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
>>> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
>>> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
>>> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
>>> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
>>> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
>>> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
>>> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060013.9
>>> 37376.0 37182.0  38654 1288.741 1105  5636.779 6925.520
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
>>> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
>>> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
>>> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
>>> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
>>> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
>>> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
>>> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
>>> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
>>> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.0
>>> 37376.0 37182.0  38654 1288.741 1106  5646.857 6935.598
>>> ...
>>> ...
>>> ...
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1107  5656.903 6945.644
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1107  5656.903 6945.644
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>>
>>> It looks like the perm gen is full (~99%)
>>>
>>> The process doesn't recover on its own. Here's some other gc output
>>> from our logs:
>>>
>>>
>>> Heap
>>>  PSYoungGen      total 531712K, used 517887K [0x92d80000, 0xb42d0000,
>>> 0xb42d0000)
>>>  eden space 517888K, 99% used [0x92d80000,0xb273fff8,0xb2740000)
>>>  from space 13824K, 0% used [0xb2740000,0xb2740000,0xb34c0000)
>>>  to   space 14400K, 0% used [0xb34c0000,0xb34c0000,0xb42d0000)
>>>  PSOldGen        total 1092288K, used 1060042K [0x502d0000,
>>> 0x92d80000, 0x92d80000)
>>>  object space 1092288K, 97% used [0x502d0000,0x90e02a20,0x92d80000)
>>>  PSPermGen       total 37376K, used 37186K [0x4c2d0000, 0x4e750000,
>>> 0x502d0000)
>>>  object space 37376K, 99% used [0x4c2d0000,0x4e720b60,0x4e750000)
>>>
>>> Nov 9, 2010 9:29:48 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop
>>> executeAcceptLoop
>>> WARNING: RMI TCP Accept-0: accept loop for
>>> ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=36857] throws
>>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>        at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:116)
>>>        at sun.rmi.runtime.NewThreadAction.run(NewThreadAction.java:34)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at
>>> sun.rmi.transport.tcp.TCPTransport$1.newThread(TCPTransport.java:94)
>>>        at
>>> java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:672)
>>>        at
>>> java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:721)
>>>        at
>>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657)
>>>        at
>>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:384)
>>>        at
>>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
>>>        at java.lang.Thread.run(Thread.java:619)
>>>
>>>       
>>>> From jmap I can see that most of the unreachable objects are
>>>>         
>>> LinkedHashMap.Entry objects that I know are created as part of many
>>> requests.
>>>
>>> Darragh
>>>
>>> On Wed, Nov 10, 2010 at 5:03 PM, Jon Masamitsu <jon.masamitsu at oracle.com>
>>> wrote:
>>>       
>>>>  I assume that the Java process does not recover on it's
>>>> own and has to be killed?
>>>>
>>>> What are the column heading for your jstat output?
>>>>
>>>> Turn on GC logging (if you don't already have it on) and check to
>>>> see if your perm gen is full.
>>>>
>>>> If your perm gen is not full but the Java heap appears to be
>>>> full, then the garbage collector just thinks that all that data
>>>> is live.  You used jmap.  Do you know what's filling up
>>>> the heap?
>>>>
>>>>
>>>>
>>>> On 11/10/2010 7:47 AM, Darragh Curran wrote:
>>>>         
>>>>> Hi,
>>>>>
>>>>> I hope someone can help me understand this better.
>>>>>
>>>>> I'm running java version "Java(TM) SE Runtime Environment (build
>>>>> 1.6.0_19-b04) Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)"
>>>>>
>>>>> It runs tomcat, with jvm heap options '-Xmx1600m -Xms1600m'
>>>>>
>>>>> Every few weeks a host becomes unavailable for requests. When I look
>>>>> into it, the java process is using 100% CPU, seems to have no running
>>>>> threads when I do multiple stack dumps, and based on jstat output
>>>>> appears to spend all it's time doing full gc runs:
>>>>>
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1107  5656.903 6945.644
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1108  5666.547 6955.287
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>>>> 14400.0 13824.0  0.0    0.0   517888.0 517888.0 1092288.0  1060014.1
>>>>> 37376.0 37182.0  38654 1288.741 1109  5675.945 6964.686
>>>>>
>>>>> I used jmap to dump the heap for analysis. When I analyse it I see
>>>>> that 94% of objects are unreachable, but not yet collected.
>>>>>
>>>>>           
>>>>>> From jstat it appears that gc is running roughly every 10 seconds and
>>>>>>             
>>>>> lasting approx 10 seconds, but fails to free any memory.
>>>>>
>>>>> I'd really appreciate some advise on how to better understand my
>>>>> problem and what to do to try and fix it.
>>>>>
>>>>> Thanks,
>>>>> Darragh
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing list
>>>>> hotspot-gc-use at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>           
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>>>         
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>       

From shane.cox at gmail.com  Fri Nov 12 11:31:53 2010
From: shane.cox at gmail.com (Shane Cox)
Date: Fri, 12 Nov 2010 14:31:53 -0500
Subject: Setting Min and Max Heap Size to the same value affects size of Young
	Gen
Message-ID: <AANLkTikgA5Zjcp9ahBc_Gh8aLOX3aB9nfYpzCb=j7JdR@mail.gmail.com>

This is probably well known behavior, but why does setting the Min and Max
Heap Size to the same value affect the default size of the Young
Generation?  For example:

Scenario 1:
-d64 -Xms1536m -Xmx4096m -XX:+UseConcMarkSweepGC

Young Generation is small:  18,624K

{Heap before GC invocations=0 (full 0):
 par new generation   total 18624K, used 16000K [0xfffffd7ef4e00000,
0xfffffd7ef62c0000, 0xfffffd7f05c60000)
  eden space 16000K, 100% used [0xfffffd7ef4e00000, 0xfffffd7ef5da0000,
0xfffffd7ef5da0000)
  from space 2624K,   0% used [0xfffffd7ef5da0000, 0xfffffd7ef5da0000,
0xfffffd7ef6030000)
  to   space 2624K,   0% used [0xfffffd7ef6030000, 0xfffffd7ef6030000,
0xfffffd7ef62c0000)
 concurrent mark-sweep generation total 1551616K, used 0K
[0xfffffd7f05c60000, 0xfffffd7f647a0000, 0xfffffd7ff4e00000)
 concurrent-mark-sweep perm gen total 21248K, used 7265K
[0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000)
2010-11-12T14:00:16.083-0500: 0.364: [GC 0.364: [ParNew:
16000K->2150K(18624K), 0.0048839 secs] 16000K->2150K(1570240K), 0.0049538
secs] [Times: user=0.02 sys=0.01, real=0.01 secs]


Scenario 2:
-d64 -Xms1536m -Xmx1536m -XX:+UseConcMarkSweepGC

Young Generation is much larger:  172,032K

{Heap before GC invocations=0 (full 0):
 par new generation   total 172032K, used 147456K [0xfffffd7f94e00000,
0xfffffd7fa0e00000, 0xfffffd7fa0e00000)
  eden space 147456K, 100% used [0xfffffd7f94e00000, 0xfffffd7f9de00000,
0xfffffd7f9de00000)
  from space 24576K,   0% used [0xfffffd7f9de00000, 0xfffffd7f9de00000,
0xfffffd7f9f600000)
  to   space 24576K,   0% used [0xfffffd7f9f600000, 0xfffffd7f9f600000,
0xfffffd7fa0e00000)
 concurrent mark-sweep generation total 1376256K, used 0K
[0xfffffd7fa0e00000, 0xfffffd7ff4e00000, 0xfffffd7ff4e00000)
 concurrent-mark-sweep perm gen total 21248K, used 12639K
[0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000)
2010-11-12T11:53:01.376-0500: 360.088: [GC 360.088: [ParNew:
147456K->7373K(172032K), 0.0093910 secs] 147456K->7373K(1548288K), 0.0094709
secs] [Times: user=0.03 sys=0.02, real=0.01 secs]


jinfo reports the value of -XX:CMSYoungGenPerWorker=16777216 in both
scenarios, as well as -XX:ParallelGCThreads=13.  So it's unclear to me why
the Young Gen would be so small when -Xms and -Xmx are different values
(Scenario 1).


java -version
java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)


Any insights would be appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101112/e59a9f83/attachment.html 

From y.s.ramakrishna at oracle.com  Fri Nov 12 13:24:46 2010
From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna)
Date: Fri, 12 Nov 2010 13:24:46 -0800
Subject: Setting Min and Max Heap Size to the same value affects size
	of Young Gen
In-Reply-To: <AANLkTikgA5Zjcp9ahBc_Gh8aLOX3aB9nfYpzCb=j7JdR@mail.gmail.com>
References: <AANLkTikgA5Zjcp9ahBc_Gh8aLOX3aB9nfYpzCb=j7JdR@mail.gmail.com>
Message-ID: <4CDDB09E.1000302@oracle.com>

Sounds like a bug to me. I'll check the latest JDK when i get the chance.

-- ramki

On 11/12/2010 11:31 AM, Shane Cox wrote:
> This is probably well known behavior, but why does setting the Min and Max
> Heap Size to the same value affect the default size of the Young
> Generation?  For example:
>
> Scenario 1:
> -d64 -Xms1536m -Xmx4096m -XX:+UseConcMarkSweepGC
>
> Young Generation is small:  18,624K
>
> {Heap before GC invocations=0 (full 0):
>   par new generation   total 18624K, used 16000K [0xfffffd7ef4e00000,
> 0xfffffd7ef62c0000, 0xfffffd7f05c60000)
>    eden space 16000K, 100% used [0xfffffd7ef4e00000, 0xfffffd7ef5da0000,
> 0xfffffd7ef5da0000)
>    from space 2624K,   0% used [0xfffffd7ef5da0000, 0xfffffd7ef5da0000,
> 0xfffffd7ef6030000)
>    to   space 2624K,   0% used [0xfffffd7ef6030000, 0xfffffd7ef6030000,
> 0xfffffd7ef62c0000)
>   concurrent mark-sweep generation total 1551616K, used 0K
> [0xfffffd7f05c60000, 0xfffffd7f647a0000, 0xfffffd7ff4e00000)
>   concurrent-mark-sweep perm gen total 21248K, used 7265K
> [0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000)
> 2010-11-12T14:00:16.083-0500: 0.364: [GC 0.364: [ParNew:
> 16000K->2150K(18624K), 0.0048839 secs] 16000K->2150K(1570240K), 0.0049538
> secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
>
>
>
> Scenario 2:
> -d64 -Xms1536m -Xmx1536m -XX:+UseConcMarkSweepGC
>
> Young Generation is much larger:  172,032K
>
> {Heap before GC invocations=0 (full 0):
>   par new generation   total 172032K, used 147456K [0xfffffd7f94e00000,
> 0xfffffd7fa0e00000, 0xfffffd7fa0e00000)
>    eden space 147456K, 100% used [0xfffffd7f94e00000, 0xfffffd7f9de00000,
> 0xfffffd7f9de00000)
>    from space 24576K,   0% used [0xfffffd7f9de00000, 0xfffffd7f9de00000,
> 0xfffffd7f9f600000)
>    to   space 24576K,   0% used [0xfffffd7f9f600000, 0xfffffd7f9f600000,
> 0xfffffd7fa0e00000)
>   concurrent mark-sweep generation total 1376256K, used 0K
> [0xfffffd7fa0e00000, 0xfffffd7ff4e00000, 0xfffffd7ff4e00000)
>   concurrent-mark-sweep perm gen total 21248K, used 12639K
> [0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000)
> 2010-11-12T11:53:01.376-0500: 360.088: [GC 360.088: [ParNew:
> 147456K->7373K(172032K), 0.0093910 secs] 147456K->7373K(1548288K), 0.0094709
> secs] [Times: user=0.03 sys=0.02, real=0.01 secs]
>
>
>
> jinfo reports the value of -XX:CMSYoungGenPerWorker=16777216 in both
> scenarios, as well as -XX:ParallelGCThreads=13.  So it's unclear to me why
> the Young Gen would be so small when -Xms and -Xmx are different values
> (Scenario 1).
>
>
> java -version
> java version "1.6.0_14"
> Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
> Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
>
>
> Any insights would be appreciated.
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From y.s.ramakrishna at oracle.com  Mon Nov 15 10:22:20 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Mon, 15 Nov 2010 10:22:20 -0800
Subject: Setting Min and Max Heap Size to the same value affects size
	of Young Gen
In-Reply-To: <4CDDB09E.1000302@oracle.com>
References: <AANLkTikgA5Zjcp9ahBc_Gh8aLOX3aB9nfYpzCb=j7JdR@mail.gmail.com>
	<4CDDB09E.1000302@oracle.com>
Message-ID: <4CE17A5C.3080202@oracle.com>

Hi Shane --

Yes, the bug exists also in the latest JDK 7 builds. I am looking into it
and filed the following bug to track it:-

  7000125 CMS: Anti-monotone young gen sizing with respect to maximum whole heap size specification

-- ramki

On 11/12/10 13:24, Y. Srinivas Ramakrishna wrote:
> Sounds like a bug to me. I'll check the latest JDK when i get the chance.
> 
> -- ramki
> 
> On 11/12/2010 11:31 AM, Shane Cox wrote:
>> This is probably well known behavior, but why does setting the Min and Max
>> Heap Size to the same value affect the default size of the Young
>> Generation?  For example:
>>
>> Scenario 1:
>> -d64 -Xms1536m -Xmx4096m -XX:+UseConcMarkSweepGC
>>
>> Young Generation is small:  18,624K
>>
>> {Heap before GC invocations=0 (full 0):
>>   par new generation   total 18624K, used 16000K [0xfffffd7ef4e00000,
>> 0xfffffd7ef62c0000, 0xfffffd7f05c60000)
>>    eden space 16000K, 100% used [0xfffffd7ef4e00000, 0xfffffd7ef5da0000,
>> 0xfffffd7ef5da0000)
>>    from space 2624K,   0% used [0xfffffd7ef5da0000, 0xfffffd7ef5da0000,
>> 0xfffffd7ef6030000)
>>    to   space 2624K,   0% used [0xfffffd7ef6030000, 0xfffffd7ef6030000,
>> 0xfffffd7ef62c0000)
>>   concurrent mark-sweep generation total 1551616K, used 0K
>> [0xfffffd7f05c60000, 0xfffffd7f647a0000, 0xfffffd7ff4e00000)
>>   concurrent-mark-sweep perm gen total 21248K, used 7265K
>> [0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000)
>> 2010-11-12T14:00:16.083-0500: 0.364: [GC 0.364: [ParNew:
>> 16000K->2150K(18624K), 0.0048839 secs] 16000K->2150K(1570240K), 0.0049538
>> secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
>>
>>
>>
>> Scenario 2:
>> -d64 -Xms1536m -Xmx1536m -XX:+UseConcMarkSweepGC
>>
>> Young Generation is much larger:  172,032K
>>
>> {Heap before GC invocations=0 (full 0):
>>   par new generation   total 172032K, used 147456K [0xfffffd7f94e00000,
>> 0xfffffd7fa0e00000, 0xfffffd7fa0e00000)
>>    eden space 147456K, 100% used [0xfffffd7f94e00000, 0xfffffd7f9de00000,
>> 0xfffffd7f9de00000)
>>    from space 24576K,   0% used [0xfffffd7f9de00000, 0xfffffd7f9de00000,
>> 0xfffffd7f9f600000)
>>    to   space 24576K,   0% used [0xfffffd7f9f600000, 0xfffffd7f9f600000,
>> 0xfffffd7fa0e00000)
>>   concurrent mark-sweep generation total 1376256K, used 0K
>> [0xfffffd7fa0e00000, 0xfffffd7ff4e00000, 0xfffffd7ff4e00000)
>>   concurrent-mark-sweep perm gen total 21248K, used 12639K
>> [0xfffffd7ff4e00000, 0xfffffd7ff62c0000, 0xfffffd7ffa200000)
>> 2010-11-12T11:53:01.376-0500: 360.088: [GC 360.088: [ParNew:
>> 147456K->7373K(172032K), 0.0093910 secs] 147456K->7373K(1548288K), 0.0094709
>> secs] [Times: user=0.03 sys=0.02, real=0.01 secs]
>>
>>
>>
>> jinfo reports the value of -XX:CMSYoungGenPerWorker=16777216 in both
>> scenarios, as well as -XX:ParallelGCThreads=13.  So it's unclear to me why
>> the Young Gen would be so small when -Xms and -Xmx are different values
>> (Scenario 1).
>>
>>
>> java -version
>> java version "1.6.0_14"
>> Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
>> Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
>>
>>
>> Any insights would be appreciated.
>>
>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From dorirabin at gmail.com  Mon Nov  8 09:27:13 2010
From: dorirabin at gmail.com (Dori Rabin)
Date: Mon, 8 Nov 2010 19:27:13 +0200
Subject: problem in gc with incremental mode
In-Reply-To: <4CD82F5C.20806@oracle.com>
References: <983CFBCFF00E9A498F2703DBD7155DC7295D95162D@ISR-IT-EX-01.starhome.local>
	<4CD2E49B.1080008@oracle.com>
	<AANLkTi=AH1otmk-g+mTy69MGXNE9-G8+ERXT3PuXNSs1@mail.gmail.com>
	<4CD776C2.3070601@oracle.com> <4CD82F5C.20806@oracle.com>
Message-ID: <AANLkTik4D3PsfOyBAjtZGq=DEvjszNTXt6y5qmbxgS3b@mail.gmail.com>

Well thank you both
I will try these recommendations hoping for the best
Thanks
Dori

On Mon, Nov 8, 2010 at 7:11 PM, Y. S. Ramakrishna <
y.s.ramakrishna at oracle.com> wrote:

> I agree completely with Jon's recommendation. As regards, your question:-
>
>
>  I didn't quiet understand what exactly you meant by "scale back the #
>>>> parallel threads used by the CMS..." is there a parameter I need to set for
>>>> that ? if yes what should be its value ?
>>>>
>>>
> On a 4-processor machine, you get a single concurrent marking thread, so
> you do not need to do anything specific since you are already at the
> minimum
> # concurrent worker threads used by CMS.
>
> -- ramki
>
>
> On 11/07/10 20:04, Jon Masamitsu wrote:
>
>> The incremental mode of CMS was implemented for running CMS on machines
>> with a single hardware thread.  The idea was that CMS does work
>> concurrently
>> with the application but with a single hardware thread, when CMS ran in
>> a concurrent phase, it would still be stopping the application (CMS would
>> use the single hardware thread and there would be nothing for the
>> application
>> to run on).  So the incremental mode of CMS does the concurrent phases
>> in increments and gives up the hardware thread to the application on
>> a regular basis.  On a 4 processor machine I would not recommend the
>> incremental
>> mode.    If you have not tried the regular CMS (no incremental mode),
>> I recommend that you try it.  Overall it is more efficient.
>>
>> On 11/6/10 11:32 PM, Dori Rabin wrote:
>>
>>> Thanks fo your replies...
>>> I am  running on jdk1.6.0_04 and OS is Linux 2.4.21-37.ELsmp, I have 4
>>> processors on the machine
>>> About sharing the log file it might be a little problematic now so let's
>>> discuss it if no other option I also thought of getting rid of the
>>> incremental mode but I am afraid of the effect of long pauses on our
>>> realtime application due to the stop the world phase of CMS....
>>> I didn't quiet understand what exactly you meant by "scale back the #
>>> parallel threads used by the CMS..." is there a parameter I need to set for
>>> that ? if yes what should be its value ?
>>> Thanks
>>> Dori
>>>   On Thu, Nov 4, 2010 at 6:51 PM, Y. S. Ramakrishna <
>>> y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>> wrote:
>>>
>>>    Hi Dori --
>>>
>>>    What's the version of JDK you are running? Can you share a
>>>    complete log?
>>>    It appears as though the iCMS "auto-pacing" is, for some reason, not
>>>    kicking in "soon enough"; one workaround is to use turn off
>>>    auto-pacing
>>>    and use an explicit duty-cycle (which has its own disadvantages).
>>>
>>>    I'd suggest filing a bug, and including a complete log file showing
>>>    the problem.
>>>
>>>    thanks.
>>>    -- ramki
>>>
>>>    On 11/04/10 06:27, Rabin Dori wrote:
>>>    > Hi,
>>>    >
>>>    > Once in a while and for a reason I cannot understand the CMS
>>>    kicks up
>>>    > too late which causes a promotion failure and full GC that takes
>>>    very
>>>    > long (more than 2 minutes which causes other problems)?
>>>    >
>>>    > My question is how to tune the gc flags in order to make sure
>>>    that the
>>>    > concurrent sweep will always occur in parallel (incremental mode)
>>>    > without long pause stop the world but also without reaching its
>>>    maximum
>>>    > capacity ?
>>>    >
>>>    >
>>>    >
>>>    > (I know that in my case the *CMSInitiatingOccupancyFraction*=60 is
>>>    > ignored because of the CMSIncrementalMode
>>>    >
>>>    > And from looking in the log file we can see that the old generation
>>>    > reaches size of 835?959K out of 843?000K at the time the concurrent
>>>    > failure (I marked this line in red)
>>>    >
>>>    >
>>>    >
>>>    > *_I am running the jvm with the following parameters :_*
>>>    >
>>>    > wrapper.java.additional.4=-XX:NewSize=200m
>>>    >
>>>    > wrapper.java.additional.5=-XX:SurvivorRatio=6
>>>    >
>>>    > wrapper.java.additional.6=-XX:MaxTenuringThreshold=4
>>>    >
>>>    > wrapper.java.additional.7=-XX:+CMSIncrementalMode
>>>    >
>>>    > wrapper.java.additional.8=-XX:+CMSIncrementalPacing
>>>    >
>>>    > wrapper.java.additional.9=-XX:+DisableExplicitGC
>>>    >
>>>    > wrapper.java.additional.10=-XX:+UseConcMarkSweepGC
>>>    >
>>>    > wrapper.java.additional.11=-XX:+CMSClassUnloadingEnabled
>>>    >
>>>    > wrapper.java.additional.12=-XX:+PrintGCDetails
>>>    >
>>>    > wrapper.java.additional.13=-XX:+PrintGCTimeStamps
>>>    >
>>>    > wrapper.java.additional.14=-XX:-TraceClassUnloading
>>>    >
>>>    > wrapper.java.additional.15=-XX:+HeapDumpOnOutOfMemoryError
>>>    >
>>>    > wrapper.java.additional.16=-verbose:gc
>>>    >
>>>    > wrapper.java.additional.17=-XX:CMSInitiatingOccupancyFraction=60
>>>    >
>>>    > wrapper.java.additional.18=-XX:+UseCMSInitiatingOccupancyOnly
>>>    >
>>>    > wrapper.java.additional.19=-XX:+PrintTenuringDistribution
>>>    >
>>>    >
>>>    >
>>>    >
>>>    >
>>>    > *_Extracts from the log file:_*
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:54:33 | Desired survivor size
>>>    13107200
>>>    > bytes, new threshold 4 (max 4)
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   1:     544000
>>>    > bytes,     544000 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   2:     346320
>>>    > bytes,     890320 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   3:     262800
>>>    > bytes,    1153120 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:54:33 | - age   4:     238528
>>>    > bytes,    1391648 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:54:33 | : 155621K->2065K(179200K),
>>>    > 0.1046330 secs] 988712K->835373K(1022976K) icms_dc=0 , 0.1047500
>>>    secs]
>>>    > [Times: user=0.00 sys=0.00, real=0.11 secs]
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:55:54 | 422097.583: [GC
>>>    422097.583:
>>>    > [ParNew
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:55:54 |
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:55:54 | Desired survivor size
>>>    13107200
>>>    > bytes, new threshold 4 (max 4)
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   1:     577104
>>>    > bytes,     577104 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   2:     261856
>>>    > bytes,     838960 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   3:     298832
>>>    > bytes,    1137792 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:55:54 | - age   4:     259176
>>>    > bytes,    1396968 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:55:54 | : 155664K->2386K(179200K),
>>>    > 0.0498010 secs] 988972K->835920K(1022976K) icms_dc=0 , 0.0499370
>>>    secs]
>>>    > [Times: user=0.00 sys=0.00, real=0.05 secs]
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:57:27 | 422190.993: [GC
>>>    422190.993:
>>>    > [ParNew
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:57:28 |
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:57:28 | Desired survivor size
>>>    13107200
>>>    > bytes, new threshold 4 (max 4)
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   1:     676656
>>>    > bytes,     676656 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   2:     283376
>>>    > bytes,     960032 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   3:     239472
>>>    > bytes,    1199504 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:57:28 | - age   4:     264960
>>>    > bytes,    1464464 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:57:28 | : 155986K->1918K(179200K),
>>>    > 0.0652010 secs] 989520K->835699K(1022976K) icms_dc=0 , 0.0653200
>>>    secs]
>>>    > [Times: user=0.01 sys=0.00, real=0.07 secs]
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:58:54 | 422277.406: [GC
>>>    422277.406:
>>>    > [ParNew
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:58:54 | Desired survivor size
>>>    13107200
>>>    > bytes, new threshold 4 (max 4)
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   1:     615944
>>>    > bytes,     615944 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   2:     334120
>>>    > bytes,     950064 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   3:     276736
>>>    > bytes,    1226800 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:58:54 | - age   4:     236424
>>>    > bytes,    1463224 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 04:58:54 | : 155518K->1928K(179200K),
>>>    > 0.0378730 secs] 989299K->835959K(1022976K) icms_dc=0 , 0.0379920
>>>    secs]
>>>    > [Times: user=0.00 sys=0.00, real=0.04 secs]
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:00:23 | 422366.439: [GC
>>>    422366.439:
>>>    > [ParNew
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:00:23 |  (promotion failed)
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:00:23 | Desired survivor size
>>>    13107200
>>>    > bytes, new threshold 4 (max 4)
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   1:     574000
>>>    > bytes,     574000 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   2:     315432
>>>    > bytes,     889432 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   3:     281216
>>>    > bytes,    1170648 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:00:23 | - age   4:     271776
>>>    > bytes,    1442424 total
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:00:23 | :
>>>    155528K->155689K(179200K),
>>>    > 0.1007840 secs]422366.540: [CMS
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>>    > sun.reflect.GeneratedMethodAccessor121]
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>>    > sun.reflect.GeneratedMethodAccessor119]
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>>    > sun.reflect.GeneratedMethodAccessor124]
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>>    > sun.reflect.GeneratedMethodAccessor123]
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>>    > sun.reflect.GeneratedMethodAccessor120]
>>>    >
>>>    > INFO   | jvm 1    | 2010/11/02 05:01:46 | [Unloading class
>>>    > sun.reflect.GeneratedMethodAccessor122]
>>>    >
>>>    > ERROR  | wrapper  | 2010/11/02 05:02:37 | JVM appears hung:
>>>    Timed out
>>>    > waiting for signal from JVM.
>>>    >
>>>    >
>>>    >
>>>    >
>>>    >
>>>    > *Dori Rabin*
>>>    >
>>>    > *cid:image001.gif at 01CB69E7.E5E45760*
>>>    >
>>>    >
>>>    >
>>>    > *cid:image002.jpg at 01CB69E7.E5E45760*
>>>    >
>>>    > T. +972-3-123-4567   F. +972-3- 766-3559   M. +972-54- 4232-706
>>>    >
>>>    > Email: mailto:Dori <mailto:Dori> <mailto:your-email-address
>>>
>>>    <mailto:your-email-address>>.Rabin at starhome.com
>>>    <mailto:Rabin at starhome.com>
>>>
>>>    >
>>>    >
>>>    >
>>>    >
>>>    >
>>>    > *cid:image003.gif at 01CB69E7.E5E45760* <http://www.starhome.com/>
>>>    > *cid:image004.gif at 01CB69E7.E5E45760*
>>>    > <http://blog.starhome.com/>   *cid:image005.gif at 01CB69E7.E5E45760*
>>>    > <http://bit.ly/9SbzNs>   *cid:image006.gif at 01CB69E7.E5E45760*
>>>    > <http://bit.ly/aoU2m3>   *cid:image007.gif at 01CB69E7.E5E45760*
>>>    > <http://linkd.in/bjscKL>
>>>    >
>>>    > This email contains proprietary and/or confidential information of
>>>    > Starhome. If you
>>>    >
>>>    > have received this email in error, please delete all copies without
>>>    > delay and do not
>>>    >
>>>    > copy, distribute, or rely on any information contained in this
>>>    email.
>>>    > Thank you!
>>>    >
>>>    >
>>>    >
>>>    >
>>>    >
>>>    >
>>>    >
>>>    >
>>>    >
>>>
>>>  ------------------------------------------------------------------------
>>>    >
>>>    > _______________________________________________
>>>    > hotspot-gc-use mailing list
>>>    > hotspot-gc-use at openjdk.java.net
>>>    <mailto:hotspot-gc-use at openjdk.java.net>
>>>
>>>    > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>    _______________________________________________
>>>    hotspot-gc-use mailing list
>>>    hotspot-gc-use at openjdk.java.net
>>>    <mailto:hotspot-gc-use at openjdk.java.net>
>>>
>>>    http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101108/c66c2d10/attachment-0001.html 

From brian.williams at mayalane.com  Mon Nov 15 11:29:26 2010
From: brian.williams at mayalane.com (Brian Williams)
Date: Mon, 15 Nov 2010 14:29:26 -0500
Subject: CMS Promotion Failures
Message-ID: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>


Greetings,
We'd like some pointers on how to tune to avoid (or more realistically delay as long as possible) promotion failures with CMS.  Our server maintains an in memory database cache, that on appropriate hardware could take up over 100GB of RAM.

Through what we've been able to find online, and lots of experimentation, we've made a lot of progress in tuning GC to work well for us.  We have the same problem that others with similar access patterns have--no matter what, we eventually seem to hit a promotion failure, which triggers a STW serial collection.

Here are the general principles that we've arrived at to delay the promotion failure:

1.  Limit how much data is promoted to just what is actually old garbage.  This can be done by having a large new size, survivor size, and tenuring threshold.

2.  Use as large of heap as possible regardless of the size of the database cache that's needed.

3.  If possible, fully preload the database cache into memory at startup, and then perform a System.gc() to fully compact the old generation.  This will start things off with as little fragmentation as possible.

A few questions

1.  Is it better to have CMSInitiatingOccupancyFraction set closer to the amount of live data in the server so that CMS runs more frequently or to set this value as high as possible without running into a concurrent mode failure?

2.  Would running with -XX:+AlwaysPreTouch make any difference?

3.  We've seen mentioned on this list that there are additional things that can be done to tune against promotion failures, e.g. "As regards fragmentation, it can be tricky to tune against, but we can try once we understand a bit more about the object sizes and demographics."  But we haven't seen any pointers for how to go about this.  Can you point us in the right direction?

4.  Would changing any of the PLAB/TLAB settings make a difference?

5.  What are the main factors that affect the duration of a promotion failure?  Is it the amount of live data in bytes, the number of live objects, the total size of the heap, etc?

6.  Are there any other JVM settings that we should try, other advice?

By the way, we have given G1 a try, but we're still getting full GCs pretty frequently.

Sorry for all of the questions.  We definitely appreciate any help you can offer.

Brian

From y.s.ramakrishna at oracle.com  Mon Nov 15 14:16:05 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Mon, 15 Nov 2010 14:16:05 -0800
Subject: CMS Promotion Failures
In-Reply-To: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>
References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>
Message-ID: <4CE1B125.2020302@oracle.com>


On 11/15/10 11:29, Brian Williams wrote:
> Greetings,
> We'd like some pointers on how to tune to avoid (or more realistically delay as long as possible) promotion failures with CMS.  Our server maintains an in memory database cache, that on appropriate hardware could take up over 100GB of RAM.
> 
> Through what we've been able to find online, and lots of experimentation, we've made a lot of progress in tuning GC to work well for us.  We have the same problem that others with similar access patterns have--no matter what, we eventually seem to hit a promotion failure, which triggers a STW serial collection.
> 
> Here are the general principles that we've arrived at to delay the promotion failure:
> 
> 1.  Limit how much data is promoted to just what is actually old garbage.  This can be done by having a large new size, survivor size, and tenuring threshold.
> 
> 2.  Use as large of heap as possible regardless of the size of the database cache that's needed.
> 
> 3.  If possible, fully preload the database cache into memory at startup, and then perform a System.gc() to fully compact the old generation.  This will start things off with as little fragmentation as possible.
> 
> A few questions
> 
> 1.  Is it better to have CMSInitiatingOccupancyFraction set closer to the amount of live data in the server so that CMS runs more frequently or to set this value as high as possible without running into a concurrent mode failure?
> 

Somewhere in between. My experience has been that you want yr CMS cycles to be
neither too frequent, nor too infrequent.

> 2.  Would running with -XX:+AlwaysPreTouch make any difference?

Only initially, until all of the old gen pages get objects promoted into them.
On Solaris at least there is sometimes a cost from first touch, expecially
if using very large pages. The pre-touch moves that cost out of the scavenges
to the start-up phase.

> 
> 3.  We've seen mentioned on this list that there are additional things that can be done to tune against promotion failures, e.g. "As regards fragmentation, it can be tricky to tune against, but we can try once we understand a bit more about the object sizes and demographics."  But we haven't seen any pointers for how to go about this.  Can you point us in the right direction?
> 

The basic idea is as you say in (1), promote only medium- and long-lived data.
In other words, never promote any short-lived data, even under sudden load
spikes.

> 4.  Would changing any of the PLAB/TLAB settings make a difference?

These are autonomically sized and it's unlikely that a static setting
will outperform the adaption, epsecially if you do not have steady loads.

> 
> 5.  What are the main factors that affect the duration of a promotion failure?  Is it the amount of live data in bytes, the number of live objects, the total size of the heap, etc?
> 

Yes. :-)

(More seriously the cost is proportional to the amount copied, i.e. live data, and the
size of the heap, i.e. also the dead data; the overhead is also slightly higher if you have many
small as opposed to a few large objects.)

> 6.  Are there any other JVM settings that we should try, other advice?

Controlling promotion rate and avoiding premature promotion of short-lived data
is the most important piece of advice.

> 
> By the way, we have given G1 a try, but we're still getting full GCs pretty frequently.

Try giving G1 a bit more heap, and instead of constraining generation sizes to what
worked best for CMS, just specify a pause-time (start higher and slowly iterate
lower) and let G1's autonomics find an optimal partitioning of the heap.
There are probably a few not yet known sharp corners of G1 that if you
bring to our attention we can try and fix. One current disadvantage of G1
which is planned to be fixed soon, is that we do not deal with Reference
onjects during scavenges, so this can place G1 at a great disadvantage in terms
of carrying a lot more garbage, if your application happens to use
Reference objects (perhaps under the covers by the JDK libraries
that you are using).

Look at the GC tuning talk by Charlie Hunt and Tony Printezis in this year's
JavaOne for some good advice on GC tuning in general and CMS tuning in particular.
Hopefully they will also include G1 tuning into such a talk next year :-)

best.
-- ramki

> 
> Sorry for all of the questions.  We definitely appreciate any help you can offer.
> 
> Brian
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From shaun.hennessy at alcatel-lucent.com  Mon Nov 15 17:43:17 2010
From: shaun.hennessy at alcatel-lucent.com (Shaun Hennessy)
Date: Mon, 15 Nov 2010 20:43:17 -0500
Subject: CMS Promotion Failures
In-Reply-To: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>
References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>
Message-ID: <4CE1E1B5.3080302@alcatel-lucent.com>

Brian Williams wrote:
> Greetings,
> We'd like some pointers on how to tune to avoid (or more realistically delay as long as possible) promotion failures with CMS.  Our server maintains an in memory database cache, that on appropriate hardware could take up over 100GB of RAM.
>
> Through what we've been able to find online, and lots of experimentation, we've made a lot of progress in tuning GC to work well for us.  We have the same problem that others with similar access patterns have--no matter what, we eventually seem to hit a promotion failure, which triggers a STW serial collection.
>
> Here are the general principles that we've arrived at to delay the promotion failure:
>
> 1.  Limit how much data is promoted to just what is actually old garbage.  This can be done by having a large new size, survivor size, and tenuring threshold.
>
> 2.  Use as large of heap as possible regardless of the size of the database cache that's needed.
>
> 3.  If possible, fully preload the database cache into memory at startup, and then perform a System.gc() to fully compact the old generation.  This will start things off with as little fragmentation as possible.
>
> A few questions
>   
If the goal is just avoiding/delaying promotion failures the above all 
sound like good ideas to
achieve your goal -- if any negatives they cause aren't a problem.

As for the below I would set  CMSInitiatingOccupancyFraction to lower 
value (ie 75, or 65, etc..)
- if you set the value low enough, triggering CMS collection sooner -- 
could you not avoid
promotion failures? 

I assume the point is to avoid the STW time spent by the promotion 
failure --- and manually
triggering periodic System.gc()'s while app is running normally would be 
no better than the
promotion failures in the first place?  ( Throughput collector is not an 
option?)


> 1.  Is it better to have CMSInitiatingOccupancyFraction set closer to the amount of live data in the server so that CMS runs more frequently or to set this value as high as possible without running into a concurrent mode failure?
>
> 2.  Would running with -XX:+AlwaysPreTouch make any difference?
>
> 3.  We've seen mentioned on this list that there are additional things that can be done to tune against promotion failures, e.g. "As regards fragmentation, it can be tricky to tune against, but we can try once we understand a bit more about the object sizes and demographics."  But we haven't seen any pointers for how to go about this.  Can you point us in the right direction?
>
> 4.  Would changing any of the PLAB/TLAB settings make a difference?
>
> 5.  What are the main factors that affect the duration of a promotion failure?  Is it the amount of live data in bytes, the number of live objects, the total size of the heap, etc?
>
> 6.  Are there any other JVM settings that we should try, other advice?
>
> By the way, we have given G1 a try, but we're still getting full GCs pretty frequently.
>
> Sorry for all of the questions.  We definitely appreciate any help you can offer.
>
> Brian
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>   


From jon.masamitsu at oracle.com  Mon Nov 15 18:31:04 2010
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Mon, 15 Nov 2010 18:31:04 -0800
Subject: CMS Promotion Failures
In-Reply-To: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>
References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>
Message-ID: <4CE1ECE8.9040804@oracle.com>

  Brian,

Ramki and Shaun have addressed most of your questions I think.
Just wanted to know what type of platform (how many hardware
threads)  you're using.  Also what is CMS doing when the
promotion failures are happening (concurrent marking,
preclean cleaning or sweeping)?

Jon

On 11/15/2010 11:29 AM, Brian Williams wrote:
> Greetings,
> We'd like some pointers on how to tune to avoid (or more realistically delay as long as possible) promotion failures with CMS.  Our server maintains an in memory database cache, that on appropriate hardware could take up over 100GB of RAM.
>
> Through what we've been able to find online, and lots of experimentation, we've made a lot of progress in tuning GC to work well for us.  We have the same problem that others with similar access patterns have--no matter what, we eventually seem to hit a promotion failure, which triggers a STW serial collection.
>
> Here are the general principles that we've arrived at to delay the promotion failure:
>
> 1.  Limit how much data is promoted to just what is actually old garbage.  This can be done by having a large new size, survivor size, and tenuring threshold.
>
> 2.  Use as large of heap as possible regardless of the size of the database cache that's needed.
>
> 3.  If possible, fully preload the database cache into memory at startup, and then perform a System.gc() to fully compact the old generation.  This will start things off with as little fragmentation as possible.
>
> A few questions
>
> 1.  Is it better to have CMSInitiatingOccupancyFraction set closer to the amount of live data in the server so that CMS runs more frequently or to set this value as high as possible without running into a concurrent mode failure?
>
> 2.  Would running with -XX:+AlwaysPreTouch make any difference?
>
> 3.  We've seen mentioned on this list that there are additional things that can be done to tune against promotion failures, e.g. "As regards fragmentation, it can be tricky to tune against, but we can try once we understand a bit more about the object sizes and demographics."  But we haven't seen any pointers for how to go about this.  Can you point us in the right direction?
>
> 4.  Would changing any of the PLAB/TLAB settings make a difference?
>
> 5.  What are the main factors that affect the duration of a promotion failure?  Is it the amount of live data in bytes, the number of live objects, the total size of the heap, etc?
>
> 6.  Are there any other JVM settings that we should try, other advice?
>
> By the way, we have given G1 a try, but we're still getting full GCs pretty frequently.
>
> Sorry for all of the questions.  We definitely appreciate any help you can offer.
>
> Brian
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From brian.williams at mayalane.com  Mon Nov 15 16:36:37 2010
From: brian.williams at mayalane.com (Brian Williams)
Date: Mon, 15 Nov 2010 19:36:37 -0500
Subject: CMS Promotion Failures
In-Reply-To: <4CE1B125.2020302@oracle.com>
References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>
	<4CE1B125.2020302@oracle.com>
Message-ID: <0CAD633F-B197-4EB4-A57D-F9BF69237F9A@mayalane.com>


Thanks Ramki.  If you can entertain a few followup questions:

1.  If there is anything that could explain, beyond application usage, getting the promotion failures closer and closer together.

2.  And as a follow on question.  If calling System.gc() leaves the heap in a better state than a promotion failure?  (This will help us to answer whether we want to push for a server restart or a scheduled GC).

3.  Would using fewer parallel GC threads help reduce the fragmentation by having fewer PLABs?

Brian

From y.s.ramakrishna at oracle.com  Mon Nov 15 19:08:05 2010
From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna)
Date: Mon, 15 Nov 2010 19:08:05 -0800
Subject: CMS Promotion Failures
In-Reply-To: <0CAD633F-B197-4EB4-A57D-F9BF69237F9A@mayalane.com>
References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>
	<4CE1B125.2020302@oracle.com>
	<0CAD633F-B197-4EB4-A57D-F9BF69237F9A@mayalane.com>
Message-ID: <4CE1F595.9080601@oracle.com>

On 11/15/2010 4:36 PM, Brian Williams wrote:
>
> Thanks Ramki.  If you can entertain a few followup questions:
>
> 1.  If there is anything that could explain, beyond application usage, getting the promotion failures closer and closer together.

I have not seen that behaviour before. The only cases where i can think of that occurring is if
the heap occupancy is also montonically increasing so that the "free space" available keeps
getting smaller. But I am grasping at straws here.

>
> 2.  And as a follow on question.  If calling System.gc() leaves the heap in a better state than a promotion failure?  (This will help us to answer whether we want to push for a server restart or a scheduled GC).
>

If you are not using +ExplicitGCInvokesConcurrent, then both would leave the heap
in an identical state, because they both cause a single-threaded (alas, still) compacting
collection of the entire heap. So, yes, scheduling explicit gc's to compact down
the heap at an opportune time would definitely be worthwhile, if possible.

> 3.  Would using fewer parallel GC threads help reduce the fragmentation by having fewer PLABs?

Yes, this is usually the case. More PLAB's (in the form of cached free lists with the
individual GC worker threads) does translate to potentially more fragmentation, although
i have generally found that our autonomic per-block inventory control usually results
in keeping such fragmentation in check (unless the threads are "far too many" and the
free space "too little").

-- ramki

>
> Brian


From brian.williams at mayalane.com  Wed Nov 17 06:17:14 2010
From: brian.williams at mayalane.com (Brian Williams)
Date: Wed, 17 Nov 2010 09:17:14 -0500
Subject: CMS Promotion Failures
In-Reply-To: <4CE1F595.9080601@oracle.com>
References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>
	<4CE1B125.2020302@oracle.com>
	<0CAD633F-B197-4EB4-A57D-F9BF69237F9A@mayalane.com>
	<4CE1F595.9080601@oracle.com>
Message-ID: <7DA80581-3D38-478C-9621-32B561454B63@mayalane.com>


On Nov 15, 2010, at 10:08 PM, Y. Srinivas Ramakrishna wrote:

>> 1.  If there is anything that could explain, beyond application usage, getting the promotion failures closer and closer together.
> 
> I have not seen that behaviour before. The only cases where i can think of that occurring is if
> the heap occupancy is also montonically increasing so that the "free space" available keeps
> getting smaller. But I am grasping at straws here.

Output from jstat seems to indicate that's not the case here.  Unfortunately, we're seeing this on a production server that doesn't have GC logging enabled.  We're in the process of trying to get it enabled so we can try to understand this better.

> 
>> 
>> 2.  And as a follow on question.  If calling System.gc() leaves the heap in a better state than a promotion failure?  (This will help us to answer whether we want to push for a server restart or a scheduled GC).
>> 
> 
> If you are not using +ExplicitGCInvokesConcurrent, then both would leave the heap
> in an identical state, because they both cause a single-threaded (alas, still) compacting
> collection of the entire heap. So, yes, scheduling explicit gc's to compact down
> the heap at an opportune time would definitely be worthwhile, if possible.
> 
>> 3.  Would using fewer parallel GC threads help reduce the fragmentation by having fewer PLABs?
> 
> Yes, this is usually the case. More PLAB's (in the form of cached free lists with the
> individual GC worker threads) does translate to potentially more fragmentation, although
> i have generally found that our autonomic per-block inventory control usually results
> in keeping such fragmentation in check (unless the threads are "far too many" and the
> free space "too little").

We're running on a 32-way x4600 and aren't setting the ParallelGC threads explicitly, so we're probably ending up with 32.  We will try to dial it down to see how that helps.

Thanks,
Brian

From brian.williams at mayalane.com  Wed Nov 17 06:32:51 2010
From: brian.williams at mayalane.com (Brian Williams)
Date: Wed, 17 Nov 2010 09:32:51 -0500
Subject: CMS Promotion Failures
In-Reply-To: <4CE1ECE8.9040804@oracle.com>
References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>
	<4CE1ECE8.9040804@oracle.com>
Message-ID: <1135CAF6-F263-4AA7-A70C-FF20735841D4@mayalane.com>


Hi Jon,
We just received GC detail from one machine, a 16-way x4600.  The promotion failure occurs 10 hours into the process life.

-- application startup ---
2010-11-15T22:34:47.439-0800: [GC [ParNew: 1780514K->209664K(1887488K), 1.1993542 secs] 1780514K->369453K(24956160K), 1.1996619 secs] [Times: user=5.17 sys=1.05, real=1.20 secs] 
2010-11-15T22:35:51.297-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.5736733 secs] 2047277K->610880K(24956160K), 0.5739673 secs] [Times: user=2.65 sys=0.44, real=0.57 secs] 
2010-11-15T22:37:13.893-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.5714056 secs] 2288704K->838848K(24956160K), 0.5716780 secs] [Times: user=3.32 sys=0.24, real=0.57 secs] 
2010-11-15T22:38:28.112-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.6964940 secs] 2516672K->1117914K(24956160K), 0.6967518 secs] [Times: user=4.06 sys=0.31, real=0.70 secs] 
2010-11-15T22:40:02.015-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.6151412 secs] 2795738K->1369924K(24956160K), 0.6154426 secs] [Times: user=3.50 sys=0.28, real=0.62 secs] 

...  50 CMS cycles pass ...

2010-11-16T10:05:46.116-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.4683483 secs] 8156115K->6706120K(24956160K), 0.4686710 secs] [Times: user=4.09 sys=0.25, real=0.47 secs] 
2010-11-16T10:06:15.535-0800: [GC [ParNew: 1887488K->201665K(1887488K), 0.3743882 secs] 8383944K->6904794K(24956160K), 0.3746896 secs] [Times: user=2.88 sys=0.32, real=0.37 secs] 
2010-11-16T10:06:25.861-0800: [GC [ParNew: 1879489K->209664K(1887488K), 0.6735419 secs] 8582618K->7164756K(24956160K), 0.6738153 secs] [Times: user=5.83 sys=0.68, real=0.67 secs] 
2010-11-16T10:06:26.537-0800: [GC [1 CMS-initial-mark: 6955092K(23068672K)] 7172883K(24956160K), 0.1812929 secs] [Times: user=0.18 sys=0.00, real=0.18 secs] 
2010-11-16T10:06:28.457-0800: [CMS-concurrent-mark: 1.708/1.738 secs] [Times: user=8.80 sys=0.16, real=1.74 secs] 
2010-11-16T10:06:28.768-0800: [CMS-concurrent-preclean: 0.279/0.311 secs] [Times: user=0.75 sys=0.05, real=0.31 secs] 
2010-11-16T10:06:54.938-0800: [GC [ParNew: 1887488K->208834K(1887488K), 0.3598649 secs] 8842580K->7358032K(24956160K), 0.3601703 secs] [Times: user=3.07 sys=0.11, real=0.36 secs] 
2010-11-16T10:06:57.753-0800: [CMS-concurrent-abortable-preclean: 28.096/28.986 secs] [Times: user=44.84 sys=2.12, real=28.99 secs] 
2010-11-16T10:06:57.755-0800: [GC[YG occupancy: 1050375 K (1887488 K)]2010-11-16T10:06:57.755-0800: [GC [ParNew (promotion failed): 1050375K->1051204K(1887488K), 0.9133199 secs] 8199573K->8337584K(24956160K), 0.9136117 secs] [Times: user=3.90 sys=0.01, real=0.91 secs] 
[Rescan (parallel) , 0.7407982 secs][weak refs processing, 0.0022770 secs] [1 CMS-remark: 7286379K(23068672K)] 8337584K(24956160K), 1.6572438 secs] [Times: user=7.01 sys=0.06, real=1.66 secs] 
2010-11-16T10:07:01.679-0800: [Full GC [CMS2010-11-16T10:07:42.463-0800: [CMS-concurrent-sweep: 43.044/43.050 secs] [Times: user=48.80 sys=0.72, real=43.05 secs] 
 (concurrent mode failure): 6616900K->2119438K(23068672K), 61.3282450 secs] 8504388K->2119438K(24956160K), [CMS Perm : 49108K->48394K(82008K)], 61.3285709 secs] [Times: user=61.33 sys=0.01, real=61.33 secs] 
2010-11-16T10:08:08.462-0800: [GC [ParNew: 1677824K->157105K(1887488K), 0.0797175 secs] 3797262K->2276544K(24956160K), 0.0800104 secs] [Times: user=1.00 sys=0.00, real=0.08 secs] 
2010-11-16T10:08:36.240-0800: [GC [ParNew: 1834929K->136334K(1887488K), 0.1978673 secs] 3954368K->2386916K(24956160K), 0.1981614 secs] [Times: user=1.72 sys=0.04, real=0.20 secs] 

The average amount data promoted per ParNew is 187m and we are looking into why it is so large.  If you have any insight into this particular promotion failure, we would appreciate it. 

Thanks,
Brian


On Nov 15, 2010, at 9:31 PM, Jon Masamitsu wrote:

>  Brian,
> 
> Ramki and Shaun have addressed most of your questions I think.
> Just wanted to know what type of platform (how many hardware
> threads)  you're using.  Also what is CMS doing when the
> promotion failures are happening (concurrent marking,
> preclean cleaning or sweeping)?
> 
> Jon
> 


From y.s.ramakrishna at oracle.com  Wed Nov 17 09:21:02 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Wed, 17 Nov 2010 09:21:02 -0800
Subject: CMS Promotion Failures
In-Reply-To: <7DA80581-3D38-478C-9621-32B561454B63@mayalane.com>
References: <3AB9C9E5-6C63-4DBE-A8D3-E2189EE4D53C@mayalane.com>
	<4CE1B125.2020302@oracle.com>
	<0CAD633F-B197-4EB4-A57D-F9BF69237F9A@mayalane.com>
	<4CE1F595.9080601@oracle.com>
	<7DA80581-3D38-478C-9621-32B561454B63@mayalane.com>
Message-ID: <4CE40EFE.6000304@oracle.com>


On 11/17/10 06:17, Brian Williams wrote:
> 
> On Nov 15, 2010, at 10:08 PM, Y. Srinivas Ramakrishna wrote:
> 
>>> 1.  If there is anything that could explain, beyond application usage, getting the promotion failures closer and closer together.
>> I have not seen that behaviour before. The only cases where i can think of that occurring is if
>> the heap occupancy is also montonically increasing so that the "free space" available keeps
>> getting smaller. But I am grasping at straws here.
> 
> Output from jstat seems to indicate that's not the case here.  Unfortunately, we're seeing this on a production server that doesn't have GC logging enabled.  We're in the process of trying to get it enabled so we can try to understand this better.
> 

OK, thanks.

>>> 2.  And as a follow on question.  If calling System.gc() leaves the heap in a better state than a promotion failure?  (This will help us to answer whether we want to push for a server restart or a scheduled GC).
>>>
>> If you are not using +ExplicitGCInvokesConcurrent, then both would leave the heap
>> in an identical state, because they both cause a single-threaded (alas, still) compacting
>> collection of the entire heap. So, yes, scheduling explicit gc's to compact down
>> the heap at an opportune time would definitely be worthwhile, if possible.
>>
>>> 3.  Would using fewer parallel GC threads help reduce the fragmentation by having fewer PLABs?
>> Yes, this is usually the case. More PLAB's (in the form of cached free lists with the
>> individual GC worker threads) does translate to potentially more fragmentation, although
>> i have generally found that our autonomic per-block inventory control usually results
>> in keeping such fragmentation in check (unless the threads are "far too many" and the
>> free space "too little").
> 
> We're running on a 32-way x4600 and aren't setting the ParallelGC threads explicitly, so we're probably ending up with 32.  We will try to dial it down to see how that helps.
> 

I think you get 5/8*n, so prbably closer to 20. With the amount of data that is copied
per scavenge and the size of yr old gen, 20 seems reasonable and probably does not need
dialing down (at least at first blush).

 From looking at the snippets you sent, it almost seems like some kind of bug in
CMS allocation because there is plenty of free space (and comparatively not that
much promotion) when the promotion failure occurs (although full gc logs would
be needed before one could be confident of this pronouncement). So it would be
worthwhile to investigate this closely to see why this is happening. I somehow do
not think this is a tuning issue, but something else. Do you have Java support
and able to open a formal ticket with Oracle, so some formal/dedicated cycles can be
devoted looking at the issue?

What's the version of JDK you are running?

-- ramki

> Thanks,
> Brian

From tanman12345 at yahoo.com  Wed Nov 17 14:44:42 2010
From: tanman12345 at yahoo.com (Erwin)
Date: Wed, 17 Nov 2010 14:44:42 -0800 (PST)
Subject: Intermittent long ParNew times
Message-ID: <288296.29496.qm@web111114.mail.gq1.yahoo.com>

Hello,

When we?re running our load test for about 1 hour, GC seems to be fine most of the times. However, there are times where the ParNew would go as high as 25 seconds. See below sample where it was 10 seconds.
{Heap before GC invocations=11 (full 0):
 par new generation   total 921600K, used 880508K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
  eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
  from space 102400K,  59% used [0xfffffffe08400000, 0xfffffffe0bfdf1c8, 0xfffffffe0e800000)
  to   space 102400K,   0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
 concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000)
 concurrent-mark-sweep perm gen total 524288K, used 221175K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
37814.384: [GC 37814.384: [ParNew: 880508K->55794K(921600K), 0.1246958 secs] 1566486K->741772K(4091904K), 0.1249910 secs] [Times: user=0.37 sys=0.07, real=0.13 secs] 
Heap after GC invocations=12 (full 0):
 par new generation   total 921600K, used 55794K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
  eden space 819200K,   0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
  from space 102400K,  54% used [0xfffffffe02000000, 0xfffffffe0567c880, 0xfffffffe08400000)
  to   space 102400K,   0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
 concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000)
 concurrent-mark-sweep perm gen total 524288K, used 221175K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)

{Heap before GC invocations=12 (full 0):
 par new generation   total 921600K, used 874994K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
  eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
  from space 102400K,  54% used [0xfffffffe02000000, 0xfffffffe0567c880, 0xfffffffe08400000)
  to   space 102400K,   0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
 concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000)
 concurrent-mark-sweep perm gen total 524288K, used 221531K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
39088.225: [GC 39088.225: [ParNew: 874994K->102400K(921600K), 10.0339890 secs] 1560972K->821401K(4091904K), 10.0346984 secs] [Times: user=5.40 sys=31.71, real=10.04 secs] 
Heap after GC invocations=13 (full 0):
 par new generation   total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
  eden space 819200K,   0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
  from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
  to   space 102400K,   0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
 concurrent mark-sweep generation total 3170304K, used 719001K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000)
 concurrent-mark-sweep perm gen total 524288K, used 221531K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)

We?re on 64bit platform of WAS NDE 7.0.0.9 on Solaris10 platform. Our JVM args are:
-server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC  -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled

Any help would be appreciated.
Erwin


From y.s.ramakrishna at oracle.com  Wed Nov 17 16:27:02 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Wed, 17 Nov 2010 16:27:02 -0800
Subject: Intermittent long ParNew times
In-Reply-To: <288296.29496.qm@web111114.mail.gq1.yahoo.com>
References: <288296.29496.qm@web111114.mail.gq1.yahoo.com>
Message-ID: <4CE472D6.8000105@oracle.com>

That long scavenge shows unusually high system time too.
Did you make sure there was no periodic (or aperiodic) other
activity on the system that may be causing part of the JVM
heap to get paged out? I'd check vmstat for starters.

(Also, FWIW, and just to rule it out as a factor, check the
promotion volume for these scavenges and see if it shows anything.)

And while I am throwing out conjectures, does this happen only
during the initial start-up phase when the old heap occupancy
is growing? If so, see if -XX:+AlwaysPreTouch makes any difference
(also mentioned recently by Brian Williams in a separate thread here).

-- ramki

On 11/17/10 14:44, Erwin wrote:
> Hello,
> 
> When we?re running our load test for about 1 hour, GC seems to be fine most of the times. However, there are times where the ParNew would go as high as 25 seconds. See below sample where it was 10 seconds.
> {Heap before GC invocations=11 (full 0):
>  par new generation   total 921600K, used 880508K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
>   eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
>   from space 102400K,  59% used [0xfffffffe08400000, 0xfffffffe0bfdf1c8, 0xfffffffe0e800000)
>   to   space 102400K,   0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
>  concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000)
>  concurrent-mark-sweep perm gen total 524288K, used 221175K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 37814.384: [GC 37814.384: [ParNew: 880508K->55794K(921600K), 0.1246958 secs] 1566486K->741772K(4091904K), 0.1249910 secs] [Times: user=0.37 sys=0.07, real=0.13 secs] 
> Heap after GC invocations=12 (full 0):
>  par new generation   total 921600K, used 55794K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
>   eden space 819200K,   0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
>   from space 102400K,  54% used [0xfffffffe02000000, 0xfffffffe0567c880, 0xfffffffe08400000)
>   to   space 102400K,   0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
>  concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000)
>  concurrent-mark-sweep perm gen total 524288K, used 221175K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 
> {Heap before GC invocations=12 (full 0):
>  par new generation   total 921600K, used 874994K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
>   eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
>   from space 102400K,  54% used [0xfffffffe02000000, 0xfffffffe0567c880, 0xfffffffe08400000)
>   to   space 102400K,   0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
>  concurrent mark-sweep generation total 3170304K, used 685978K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000)
>  concurrent-mark-sweep perm gen total 524288K, used 221531K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 39088.225: [GC 39088.225: [ParNew: 874994K->102400K(921600K), 10.0339890 secs] 1560972K->821401K(4091904K), 10.0346984 secs] [Times: user=5.40 sys=31.71, real=10.04 secs] 
> Heap after GC invocations=13 (full 0):
>  par new generation   total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
>   eden space 819200K,   0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
>   from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
>   to   space 102400K,   0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
>  concurrent mark-sweep generation total 3170304K, used 719001K [0xfffffffe0e800000, 0xfffffffed0000000, 0xffffffff48000000)
>  concurrent-mark-sweep perm gen total 524288K, used 221531K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 
> We?re on 64bit platform of WAS NDE 7.0.0.9 on Solaris10 platform. Our JVM args are:
> -server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC  -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled
> 
> Any help would be appreciated.
> Erwin
> 
> 
> 
>       
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use