significant performance degradation seen after running application for few days with CMS collector and large heap size

Sat Jun 16 19:45:44 UTC 2012

Hello Vladimir,We seem to have very strong evidence that the slowness in our apps is caused by the code cache becoming full. We have found that trolling the logs of our servers.
We have kind of verified that if the compiler is turned off, typical operations in our app will take significantly longer - this we proved by just setting the code cache to an extremely small amount triggering the compiler to be turned off almost immediately on startup.
What we haven't been able to fully understand is why this is happening (code cache becoming full) - since we do not seem to have any dynamically loaded components (and unloading of them) in our app. It could be that we were just at the threshold and that new feature additions has made the codebase big enough that we are hitting the max. But what could explain the following: without flushing turned on, an operation runs fast first, but then once the code cache gets full, after a few days starts getting slow. It is like for sometime the compiled code is cached, and runs fast, even after code cache is full, but eventually it is "removed" and something else takes it's place. What I would have expected is that once the code cache is full, what has already been compiled and running fast keeps running fast, and the new things that get loaded would not be compiled and those would be slow. But that does not seem to be the case. This is still a puzzle to us.
Anyway we are running tests with increased code cache as well as flushing turned on to see if we see any ill effects. If those pass we will be trying them out to see if they solve our problem. We are quite hopeful.
Thank you so much so showing us the direction.
-Atul
> Date: Fri, 15 Jun 2012 11:42:05 -0700
> From: vladimir.kozlov at oracle.com
> To: atulksh at hotmail.com
> CC: hotspot-gc-dev at openjdk.java.net
> Subject: Re: significant performance degradation seen after running application for few days with CMS collector and large heap size
> 
> I forgot to say that 6u31 supports CodeCache cleanup when it is almost full:
> 
> -XX:+UseCodeCacheFlushing
> 
> So try it also if CodeCache is your problem (the flag is off by default in 6u31).
> 
> Vladimir
> 
> Vladimir Kozlov wrote:
> > If it is not full GC, may be your code cache is full so JIT stop 
> > compiling and you start running in interpreter only. Look for next 
> > message in your output:
> > 
> >   CodeCache is full. Compiler has been disabled.
> >   Try increasing the code cache size using -XX:ReservedCodeCacheSize=
> > 
> > The default is 48Mb for 6u31: -XX:ReservedCodeCacheSize=48m
> > 
> > Vladimir
> > 
> > Atul Kshatriya wrote:
> >> Hello,
> >> We have a Tomcat based java server application configured to use the 
> >> CMS collector. We have a relatively large heap - anywhere from 23Gb to 
> >> upto 58Gb. What we are seeing is that after the application runs for a 
> >> week or two, a request that would normally take few seconds takes tens 
> >> of seconds to complete. This manifests itself as users seeing 
> >> extremely slow performance from the server.
> >>  
> >> We have looked at the gc logs and they do not seem to show any 
> >> significant increase in pause times. We have made sure that the server 
> >> is not loaded (load average is pretty minimal) and there is no other 
> >> process running that would affect the server.
> >>  
> >> If the application is restarted then we again get good performance 
> >> until after a few days the performance deteriorates. Once that happens 
> >> it does not seem to go back down again and stays at that level.
> >>  
> >> The platform we are using is 64 bit x86 running SUSE (SUSE Linux 
> >> Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 0).
> >> The JDK we are using is JEK 1.6 Update 31.
> >>  
> >> The parameters we are using for the jvm are as follows:
> >>  
> >> -Xmx58G -Xms58G -server
> >> -XX:PermSize=256m -XX:MaxPermSize=256m
> >> -XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC
> >> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
> >> -XX:CMSInitiatingOccupancyFraction=70
> >> -XX:-CMSConcurrentMTEnabled -XX:+UseCompressedOops 
> >> -XX:ObjectAlignmentInBytes=16
> >> -XX:+PrintTenuringDistribution -XX:PrintCMSStatistics=2
> >>  
> >> We suspect this is something to do with the CMS garbage collect but 
> >> have not been able to identify what exactly it might be looking at the 
> >> gc logs. Has anybody seen such a behavior and any possible suspects 
> >> like memory fragmentation etc.
> >>  
> >> Any help regarding this greatly appreciated.
> >>  
> >> -Atul
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120616/d8748e50/attachment.htm>