64 bit CMS JDK 5.0 u14

Y.S.Ramakrishna at Sun.COM Y.S.Ramakrishna at Sun.COM
Mon Dec 10 18:44:48 UTC 2007


Hi Keith --

> I am running some midle-tier Portal load tests in WLS 9.2 MP2 with Sun JDK 5.0 u14. I am running 100 concurrent users that logon, navigate, open portlets, and eventually logoff; only to logon again and repeat the cycle. My testing people have establish Load Runner scripts to put the Portal software through an endurance test over 5 days with the 100 users.
> 
> Normally on 32-bit WLS 8.1 SP6 with JDK 1.4.2_13 we run with the following VM args; we attain some 3.4 million passed transactions with zero failed transactions:
> 
> -server -Xms1400m -Xmx1400m -XX:NewSize=64m -XX:MaxNewSize=64m -XX:PermSize=128m -XX:MaxPermSize=128m -Xss128k -XX:-UseTLAB -XX:+DisableExplicitGC
> -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djava.awt.headless=true
> 
> CMS never works very well in the 32-bit environment; failing miserably above; although at JDK 1.4.2_15, we see some 2 million passed transactions with 1130+ failed transactions owing to 120 seconds timeouts in concurrent mode failures.
> 
> Now, in the 64 bit environment running on AMD Windows Server 2003, I can run pretty successfully with CMS:
> 
> -Xms1500m -Xmx3500m -XX:NewSize=320m -XX:MaxNewSize=320m -Xss256k -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=0 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=40 -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djava.awt.headless=true -Dcom.sun.management.jmxremote -verbosegc -Xloggc:C:\keith\GCLogs\gc11.txt
> 
> I can achieve some 3.4 million as for the throughput collector in 32 bit env.
> 
> But, I do note several thousand failed transactions that correlate with concurrent mode failures after some 24 hrs; pauses in the 400-600 seconds range when Full GC takes over.
> 

The fact that you start seeing the concurrent mode failures after
24 hours indicates to me strongly that the old generation gets slowly
fragmented over a period of time. (Recall that the CMS collector is
non-moving.)

Can you confirm that the heap occupancy itself is constant (or nearly
so) following CMS collection cycles, and that the full gc that follows
a concurrent mode failure does not unload classes? Recall that CMS will
not, by default, unload classes during concurrent cycles unless
explicitly instructed to do so via:

      -XX:+CMSClassUnloadingEnabled -XX:+PermGenSweepingEnabled

(the second option is needed in pre-6.0 JVM's, but not in more
recent JVM's).

> I have tried varying the CMSInitiatingOccupancyFraction to 20%, but the CMS mode failures still occur.

It is usually a good idea to use survivor spaces to both reduce the
pressure on the concurrent collector (by promoting less to the old
gen), but also to reduce the spread in object sizes and lifetimes
of the objects that do get promoted. I'd suggest using survivor
spaces to make sure that survivors stay in the young gen for at least
one scavenge (MaxTenuringThreshold = 1, possibly more, as experiments
dictate), possibly more. A downside is possibly longer scavenges,
but consider that the price for (possibly) avoiding concurrent
mode failure.

Prematurely promoting objects (besides the two points made above),
can also reduce floating garbage and reduce CMS remark pause
times (by reducing mutation rates in the old generation).

> 
> I am now running with the incremental mode CMS; but anticipate further very long pauses.

 From what you described above (running CMS all the time by setting the
initiation threshold very low), it does not look as though iCMS will
buy you anything.

> 
> The VM always recovers very well after these sporadic Full GCs, but to eradicate them, should I run with an 8 GB heap or something along those lines.? I also read something about killing the swap file?
> 
> My AMD 64 bit bx, unfortunately for now is restricted to 4 GB RAM; but I am adding a further 4 GB soon. I am about to go to the Solaris SPARC 64 bit and run the exact same scenario with a 7-8 GB heap.

Increasing the heap size can indeed sometimes help you avoid
concurrent mode failure from fragmentation. (But first make
sure to enable survivor spaces and, if applicable, perm gen
collection.)

> 
> I read about the occupancy fration for OG and Perm Gen; do I need to apply this patch. Our Perm Gen is always set to 128 MB and only ever attains 108 MB.

The webrev i posted late last week should not really apply directly to
your case (except inasmuch as, in the event that you enable perm gen
collection, it might allow you to get away with not collecting the
perm gen per each cycle, and thus help keep cms remark pauses possibly
shorter). I would not worry about this patch at the level at which you
are tuning currently (which is mainly looking to avoid the concurrent
mode failures).

-- ramki

> 
> Any feedback would help us in our endeavours to support our EBI apps in a 64 bit env.
> 
> keith
> 
> 
> Keith R Holdaway
> Java Development Technologies
> 
> SAS...  The Power to Know
> 
> Carpe Diem ...
> 




More information about the hotspot-gc-dev mailing list