Strange long ParNew collections

Tue May 19 18:15:10 UTC 2009

Yes, from your description, it's very likely you are
running into the bug that was fixed in CR 6786503. The
fix is available in 1.5u19 (and 6u14 and open jdk 7).

Your Sun support can confirm that this is indeed the bug if you send
them a series of pstacks taken during the "hang" (i suppose you
would need to have your watchdog get the gcore/pstack before
restarting the prcoess since the onset of the bug is somewhat
unpredictable and you can't be waiting for the event before
your watchdog restarts the process; or you can turn off your
watchdog if in a test environment and collect the pstacks
manually, since the "hang" usually lasts for a while, even if
you have gone out for a cup of coffee :-).

-- ramki

On 05/19/09 04:27, Kaj Nygren wrote:
> Hi,
> 
> Thanks for your help. I wanted to give an update and have a couple of 
> more questions :)
> 
> We did change to -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=4  and now 
> the ParNew collections are working a lot better. We have also turned off 
> incremental mode as Jon suggested.
> 
> Now we do have one problem left though, it seems that the CMS-GC gets 
> stuck sometimes during parallell rescan. The output looks like this:
> 
> 24004,062: [CMS-concurrent-sweep: 1,755/1,932 secs]
> 24004,062: [CMS-concurrent-reset-start]
> 24004,163: [CMS-concurrent-reset: 0,101/0,101 secs]
> 24006,163: [GC [1 CMS-initial-mark: 10348978K(18432000K)] 
> 10521837K(19353600K), 0,1055176 secs]
> 24006,269: [CMS-concurrent-mark-start]
> 24010,642: [CMS-concurrent-mark: 4,373/4,373 secs]
> 24010,642: [CMS-concurrent-preclean-start]
> 24010,730: [CMS-concurrent-preclean: 0,085/0,088 secs]
> 24010,730: [CMS-concurrent-abortable-preclean-start]
>  CMS: abort preclean due to time 24011,824: 
> [CMS-concurrent-abortable-preclean: 0,792/1,094 secs]
> 24011,828: [GC[YG occupancy: 332774 K (921600 K)]
> 24011,829: [Rescan (parallel)
> 
> Then the JVM hangs for more than 5 minutes at which time our watchdog 
> determines that the server is hung and restarts it. It did happen with 
> iCMS and apparently it still happens using CMS.
> 
> I searched around and found one similar report:
> 
> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2009-January/000327.html
> 
> I know this is just a guess, but could we be experiencing the same bug? 
> If so, am I correct in interpreting the bug-reports that it should be 
> fixed in 1.5.0_18? (we are running 1.5.0_16 at the moment).
> 
> /Kaj
> 
> Y.S.Ramakrishna at Sun.COM skrev:
>> This may or may not be a cause of the sudden spikes in
>> ParNew collection, but I'd suggest using the survivor
>> spaces to age objects for at least a few collections
>> before promoting to the old gen. This reduces pressure
>> on the old gen allocation (during scavenge) improving
>> scavenges and on CMS collection as well.
>>
>> I'd suggest trying -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=4
>> to start off with and iterate based on data from 
>> -XX:+PrintTenuringDistribution.
>>
>> It is possible that the promotion of very short-lived objects fragments
>> the old gen and causes old gen allocation (for promotion) to slow down.
>> It can also cause lots of mutation in the old gen and adversely affect
>> card-scanning times further slowing down scavenges.
>>
>> -- ramki
>>
>> On 05/06/09 05:06, Kaj Nygren wrote:
>>> Hi,
>>>
>>> We are having problems with occational long ParNew pauses in our 
>>> application running on a Jboss server. The server runs well for most 
>>> of the time but sometimes we get ParNew collections that can take up 
>>> to 120 seconds. Most of the time we have several consecutive slow 
>>> collections, which are more or less killing the server under high load.
>>>
>>> We are using JDK 1.5.0_16 on Windows Server 2003, and are using 
>>> ParNew + Concurrent CMS.
>>>
>>> Our GC configuration is:
>>>
>>> wrapper.java.additional.1=-Dprogram.name=run.bat
>>> wrapper.java.additional.2=-server
>>> wrapper.java.additional.3=-Xms19000m
>>> wrapper.java.additional.4=-Xmx19000m
>>> wrapper.java.additional.5="-XX:NewSize=1000m"
>>> wrapper.java.additional.6="-XX:MaxNewSize=1000m"
>>> wrapper.java.additional.7="-XX:MaxPermSize=256m"
>>> wrapper.java.additional.8="-XX:PermSize=256m"
>>> wrapper.java.additional.11="-Xloggc:c:\mygclog.gc"
>>> wrapper.java.additional.12="-XX:+UseConcMarkSweepGC"
>>> wrapper.java.additional.13="-XX:+PrintGCDetails"
>>> wrapper.java.additional.14="-XX:-PrintTenuringDistribution"
>>> wrapper.java.additional.15="-XX:+PrintHeapAtGC"
>>> wrapper.java.additional.16="-XX:+DisableExplicitGC"
>>> wrapper.java.additional.17="-XX:+PrintGCTimeStamps"
>>> wrapper.java.additional.18="-XX:CMSInitiatingOccupancyFraction=50"
>>> wrapper.java.additional.19="-XX:+CMSIncrementalMode"
>>> wrapper.java.additional.20="-XX:+CMSIncrementalPacing"
>>> wrapper.java.additional.21="-verbose:gc"
>>>
>>> Usually ParNew is quick as can be seen here:
>>>
>>> 614276,653: [ParNew: 935058K->0K(1023040K), 0,0624699 secs] 
>>> 12383872K->11459948K(19455040K) icms_dc=22 Heap after gc 
>>> invocations=57822:
>>> par new generation   total 1023040K, used 0K [0x000000007fff0000, 
>>> 0x00000000be7f0000, 0x00000000be7f0000)
>>>  eden space 1022080K,   0% used [0x000000007fff0000, 
>>> 0x000000007fff0000, 0x00000000be610000)
>>>  from space 960K,   0% used [0x00000000be700000, 0x00000000be700000, 
>>> 0x00000000be7f0000)
>>>  to   space 960K,   0% used [0x00000000be610000, 0x00000000be610000, 
>>> 0x00000000be700000)
>>> concurrent mark-sweep generation total 18432000K, used 11459948K 
>>> [0x00000000be7f0000, 0x00000005237f0000, 0x00000005237f0000)
>>> concurrent-mark-sweep perm gen total 262144K, used 112069K 
>>> [0x00000005237f0000, 0x00000005337f0000, 0x00000005337f0000)
>>> }
>>> , 0,0628860 secs]
>>>
>>> But suddenly it becomes really slow:
>>>
>>> 620659,717: [ParNew: 1022080K->0K(1023040K), 1,3313817 secs] 
>>> 12688913K->11692925K(19455040K) icms_dc=10 Heap after gc 
>>> invocations=58587:
>>> par new generation   total 1023040K, used 0K [0x000000007fff0000, 
>>> 0x00000000be7f0000, 0x00000000be7f0000)
>>>  eden space 1022080K,   0% used [0x000000007fff0000, 
>>> 0x000000007fff0000, 0x00000000be610000)
>>>  from space 960K,   0% used [0x00000000be610000, 0x00000000be610000, 
>>> 0x00000000be700000)
>>>  to   space 960K,   0% used [0x00000000be700000, 0x00000000be700000, 
>>> 0x00000000be7f0000)
>>> concurrent mark-sweep generation total 18432000K, used 11692925K 
>>> [0x00000000be7f0000, 0x00000005237f0000, 0x00000005237f0000)
>>> concurrent-mark-sweep perm gen total 262144K, used 112098K 
>>> [0x00000005237f0000, 0x00000005337f0000, 0x00000005337f0000)
>>> }
>>> , 1,3319554 secs]
>>> 620663,221: [GC {Heap before gc invocations=58587:
>>> par new generation   total 1023040K, used 1022080K 
>>> [0x000000007fff0000, 0x00000000be7f0000, 0x00000000be7f0000)
>>>  eden space 1022080K, 100% used [0x000000007fff0000, 
>>> 0x00000000be610000, 0x00000000be610000)
>>>  from space 960K,   0% used [0x00000000be610000, 0x00000000be610000, 
>>> 0x00000000be700000)
>>>  to   space 960K,   0% used [0x00000000be700000, 0x00000000be700000, 
>>> 0x00000000be7f0000)
>>> concurrent mark-sweep generation total 18432000K, used 11692925K 
>>> [0x00000000be7f0000, 0x00000005237f0000, 0x00000005237f0000)
>>> concurrent-mark-sweep perm gen total 262144K, used 112106K 
>>> [0x00000005237f0000, 0x00000005337f0000, 0x00000005337f0000)
>>> 620663,222: [ParNew: 1022080K->0K(1023040K), 10,2909893 secs] 
>>> 12715005K->11936864K(19455040K) icms_dc=10 Heap after gc 
>>> invocations=58588:
>>> par new generation   total 1023040K, used 0K [0x000000007fff0000, 
>>> 0x00000000be7f0000, 0x00000000be7f0000)
>>>  eden space 1022080K,   0% used [0x000000007fff0000, 
>>> 0x000000007fff0000, 0x00000000be610000)
>>>  from space 960K,   0% used [0x00000000be700000, 0x00000000be700000, 
>>> 0x00000000be7f0000)
>>>  to   space 960K,   0% used [0x00000000be610000, 0x00000000be610000, 
>>> 0x00000000be700000)
>>> concurrent mark-sweep generation total 18432000K, used 11936864K 
>>> [0x00000000be7f0000, 0x00000005237f0000, 0x00000005237f0000)
>>> concurrent-mark-sweep perm gen total 262144K, used 112106K 
>>> [0x00000005237f0000, 0x00000005337f0000, 0x00000005337f0000)
>>> }
>>> , 10,2915548 secs]
>>> 620676,392: [GC {Heap before gc invocations=58588:
>>> par new generation   total 1023040K, used 1022080K 
>>> [0x000000007fff0000, 0x00000000be7f0000, 0x00000000be7f0000)
>>>  eden space 1022080K, 100% used [0x000000007fff0000, 
>>> 0x00000000be610000, 0x00000000be610000)
>>>  from space 960K,   0% used [0x00000000be700000, 0x00000000be700000, 
>>> 0x00000000be7f0000)
>>>  to   space 960K,   0% used [0x00000000be610000, 0x00000000be610000, 
>>> 0x00000000be700000)
>>> concurrent mark-sweep generation total 18432000K, used 11936864K 
>>> [0x00000000be7f0000, 0x00000005237f0000, 0x00000005237f0000)
>>> concurrent-mark-sweep perm gen total 262144K, used 112106K 
>>> [0x00000005237f0000, 0x00000005337f0000, 0x00000005337f0000)
>>> 620676,392: [ParNew: 1022080K->0K(1023040K), 3,8905950 secs] 
>>> 12958944K->12020225K(19455040K) icms_dc=10 Heap after gc 
>>> invocations=58589:
>>> par new generation   total 1023040K, used 0K [0x000000007fff0000, 
>>> 0x00000000be7f0000, 0x00000000be7f0000)
>>>  eden space 1022080K,   0% used [0x000000007fff0000, 
>>> 0x000000007fff0000, 0x00000000be610000)
>>>  from space 960K,   0% used [0x00000000be610000, 0x00000000be610000, 
>>> 0x00000000be700000)
>>>  to   space 960K,   0% used [0x00000000be700000, 0x00000000be700000, 
>>> 0x00000000be7f0000)
>>> concurrent mark-sweep generation total 18432000K, used 12020225K 
>>> [0x00000000be7f0000, 0x00000005237f0000, 0x00000005237f0000)
>>> concurrent-mark-sweep perm gen total 262144K, used 112106K 
>>> [0x00000005237f0000, 0x00000005337f0000, 0x00000005337f0000)
>>> }
>>> , 3,8911113 secs]
>>> 620682,667: [GC {Heap before gc invocations=58589:
>>> par new generation   total 1023040K, used 1022080K 
>>> [0x000000007fff0000, 0x00000000be7f0000, 0x00000000be7f0000)
>>>  eden space 1022080K, 100% used [0x000000007fff0000, 
>>> 0x00000000be610000, 0x00000000be610000)
>>>  from space 960K,   0% used [0x00000000be610000, 0x00000000be610000, 
>>> 0x00000000be700000)
>>>  to   space 960K,   0% used [0x00000000be700000, 0x00000000be700000, 
>>> 0x00000000be7f0000)
>>> concurrent mark-sweep generation total 18432000K, used 12020225K 
>>> [0x00000000be7f0000, 0x00000005237f0000, 0x00000005237f0000)
>>> concurrent-mark-sweep perm gen total 262144K, used 112112K 
>>> [0x00000005237f0000, 0x00000005337f0000, 0x00000005337f0000)
>>> 620682,668: [ParNew: 1022080K->0K(1023040K), 9,4186088 secs] 
>>> 13042305K->12214571K(19455040K) icms_dc=10 Heap after gc 
>>> invocations=58590:
>>> par new generation   total 1023040K, used 0K [0x000000007fff0000, 
>>> 0x00000000be7f0000, 0x00000000be7f0000)
>>>  eden space 1022080K,   0% used [0x000000007fff0000, 
>>> 0x000000007fff0000, 0x00000000be610000)
>>>  from space 960K,   0% used [0x00000000be700000, 0x00000000be700000, 
>>> 0x00000000be7f0000)
>>>  to   space 960K,   0% used [0x00000000be610000, 0x00000000be610000, 
>>> 0x00000000be700000)
>>> concurrent mark-sweep generation total 18432000K, used 12214571K 
>>> [0x00000000be7f0000, 0x00000005237f0000, 0x00000005237f0000)
>>> concurrent-mark-sweep perm gen total 262144K, used 112112K 
>>> [0x00000005237f0000, 0x00000005337f0000, 0x00000005337f0000)
>>> }
>>> , 9,4191551 secs]
>>>
>>> We also have problems with occational promotion failed, which I 
>>> assume is due to fragmentation.
>>> I have attatched a longer log, which shows this behaviour and ends 
>>> with a promotion failed.
>>>
>>> Does anybody have a glue as to what is going wrong, and how we can 
>>> change our gc configuration to reduce the problems?
>>> Any help is greatly appreciated!
>>>
>>> Regards
>>>
>>> Kaj Nygren
>>> kaj at mag.se
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use