64 bit CMS JDK 5.0 u14

Mon Dec 10 18:57:33 UTC 2007

Hi Keith --

Jon also has talks about this and related
issues in a few of his blogs; see:-

http://blogs.sun.com/jonthecollector/date/20060413
http://blogs.sun.com/jonthecollector/date/20060306
http://blogs.sun.com/jonthecollector/date/20060404
http://blogs.sun.com/jonthecollector/date/20070622

-- ramki

Y.S.Ramakrishna at Sun.COM wrote:
> Hi Keith --
> 
>> I am running some midle-tier Portal load tests in WLS 9.2 MP2 with Sun 
>> JDK 5.0 u14. I am running 100 concurrent users that logon, navigate, 
>> open portlets, and eventually logoff; only to logon again and repeat 
>> the cycle. My testing people have establish Load Runner scripts to put 
>> the Portal software through an endurance test over 5 days with the 100 
>> users.
>>
>> Normally on 32-bit WLS 8.1 SP6 with JDK 1.4.2_13 we run with the 
>> following VM args; we attain some 3.4 million passed transactions with 
>> zero failed transactions:
>>
>> -server -Xms1400m -Xmx1400m -XX:NewSize=64m -XX:MaxNewSize=64m 
>> -XX:PermSize=128m -XX:MaxPermSize=128m -Xss128k -XX:-UseTLAB 
>> -XX:+DisableExplicitGC
>> -Dsun.rmi.dgc.client.gcInterval=3600000 
>> -Dsun.rmi.dgc.server.gcInterval=3600000 -Djava.awt.headless=true
>>
>> CMS never works very well in the 32-bit environment; failing miserably 
>> above; although at JDK 1.4.2_15, we see some 2 million passed 
>> transactions with 1130+ failed transactions owing to 120 seconds 
>> timeouts in concurrent mode failures.
>>
>> Now, in the 64 bit environment running on AMD Windows Server 2003, I 
>> can run pretty successfully with CMS:
>>
>> -Xms1500m -Xmx3500m -XX:NewSize=320m -XX:MaxNewSize=320m -Xss256k 
>> -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC 
>> -XX:CMSFullGCsBeforeCompaction=0 -XX:+UseCMSInitiatingOccupancyOnly 
>> -XX:CMSInitiatingOccupancyFraction=40 
>> -Dsun.rmi.dgc.client.gcInterval=3600000 
>> -Dsun.rmi.dgc.server.gcInterval=3600000 -Djava.awt.headless=true 
>> -Dcom.sun.management.jmxremote -verbosegc 
>> -Xloggc:C:\keith\GCLogs\gc11.txt
>>
>> I can achieve some 3.4 million as for the throughput collector in 32 
>> bit env.
>>
>> But, I do note several thousand failed transactions that correlate 
>> with concurrent mode failures after some 24 hrs; pauses in the 400-600 
>> seconds range when Full GC takes over.
>>
> 
> The fact that you start seeing the concurrent mode failures after
> 24 hours indicates to me strongly that the old generation gets slowly
> fragmented over a period of time. (Recall that the CMS collector is
> non-moving.)
> 
> Can you confirm that the heap occupancy itself is constant (or nearly
> so) following CMS collection cycles, and that the full gc that follows
> a concurrent mode failure does not unload classes? Recall that CMS will
> not, by default, unload classes during concurrent cycles unless
> explicitly instructed to do so via:
> 
>      -XX:+CMSClassUnloadingEnabled -XX:+PermGenSweepingEnabled
> 
> (the second option is needed in pre-6.0 JVM's, but not in more
> recent JVM's).
> 
>> I have tried varying the CMSInitiatingOccupancyFraction to 20%, but 
>> the CMS mode failures still occur.
> 
> It is usually a good idea to use survivor spaces to both reduce the
> pressure on the concurrent collector (by promoting less to the old
> gen), but also to reduce the spread in object sizes and lifetimes
> of the objects that do get promoted. I'd suggest using survivor
> spaces to make sure that survivors stay in the young gen for at least
> one scavenge (MaxTenuringThreshold = 1, possibly more, as experiments
> dictate), possibly more. A downside is possibly longer scavenges,
> but consider that the price for (possibly) avoiding concurrent
> mode failure.
> 
> Prematurely promoting objects (besides the two points made above),
> can also reduce floating garbage and reduce CMS remark pause
> times (by reducing mutation rates in the old generation).
> 
>>
>> I am now running with the incremental mode CMS; but anticipate further 
>> very long pauses.
> 
>  From what you described above (running CMS all the time by setting the
> initiation threshold very low), it does not look as though iCMS will
> buy you anything.
> 
>>
>> The VM always recovers very well after these sporadic Full GCs, but to 
>> eradicate them, should I run with an 8 GB heap or something along 
>> those lines.? I also read something about killing the swap file?
>>
>> My AMD 64 bit bx, unfortunately for now is restricted to 4 GB RAM; but 
>> I am adding a further 4 GB soon. I am about to go to the Solaris SPARC 
>> 64 bit and run the exact same scenario with a 7-8 GB heap.
> 
> Increasing the heap size can indeed sometimes help you avoid
> concurrent mode failure from fragmentation. (But first make
> sure to enable survivor spaces and, if applicable, perm gen
> collection.)
> 
>>
>> I read about the occupancy fration for OG and Perm Gen; do I need to 
>> apply this patch. Our Perm Gen is always set to 128 MB and only ever 
>> attains 108 MB.
> 
> The webrev i posted late last week should not really apply directly to
> your case (except inasmuch as, in the event that you enable perm gen
> collection, it might allow you to get away with not collecting the
> perm gen per each cycle, and thus help keep cms remark pauses possibly
> shorter). I would not worry about this patch at the level at which you
> are tuning currently (which is mainly looking to avoid the concurrent
> mode failures).
> 
> -- ramki
> 
>>
>> Any feedback would help us in our endeavours to support our EBI apps 
>> in a 64 bit env.
>>
>> keith
>>
>>
>> Keith R Holdaway
>> Java Development Technologies
>>
>> SAS...  The Power to Know
>>
>> Carpe Diem ...
>>
> 
>