<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;">Hello,<BR> <BR>We are seeing intermittent pauses in our application that correlate very well with the concurrent mark phase of the CMS GC. At the outset, this seems to me to be a case of Bug # 6692906 - but given that our issue is now occuring with alarming regularity on production systems, this is not something I want to assume. <BR> <BR>Details on the environment:<BR>---------------------------<BR> <BR>The servers are running Tomcat 6.0.29 + JRE 1.6.0_20 on Windows 2003 R2 64 Bit service pack 2 servers with 8 or 16 cores (i.e., the issue has been seen on both kinds of servers). As a matter of policy, we try not to use custom JVM options unless proven absolutely necessary. The current custom options are as follows:<BR> <BR>-server -Xms1400m -Xmx1400m -Xmn350m -Xss330k -verbose:gc -XX:+DisableExplicitGC -XX:+ForceTimeHighResolution
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:../../logs/gc.log -XX:+UseParNewGC -XX:+UseConcMarkSweepGC <BR> <BR>Details on the issue:<BR>-------------------------<BR> <BR>There are really long concurrent mark phases now and then, and this causes many application threads to be paused.<BR> <BR>GC log snippet showing one case where this phase took ~ 135 seconds:<BR> <BR>848463.696: [GC [1 CMS-initial-mark: 989633K(1075200K)] 1003551K(1397760K), 6.3838946 secs] [Times: user=0.06 sys=0.01, real=6.39 secs] <BR>848470.085: [CMS-concurrent-mark-start]<BR>848605.512: [CMS-concurrent-mark: 135.377/135.426 secs] [Times: user=7.83 sys=9.92, real=135.44 secs] <BR>848605.512: [CMS-concurrent-preclean-start]<BR>848605.726: [CMS-concurrent-preclean: 0.207/0.214 secs] [Times: user=0.06 sys=0.00, real=0.22 secs] <BR>848605.726: [CMS-concurrent-abortable-preclean-start]<BR> CMS: abort preclean due to time 848610.846:
[CMS-concurrent-abortable-preclean: 1.014/5.120 secs] [Times: user=1.05 sys=0.08, real=5.13 secs] <BR>848610.871: [GC[YG occupancy: 160577 K (322560 K)]848610.871: [Rescan (parallel) , 0.1280728 secs]848610.999: [weak refs processing, 1.3120556 secs] [1 CMS-remark: 989633K(1075200K)] 1150210K(1397760K), 1.4413516 secs]<BR> <BR>In our application logs, we are able to trace an initial STW pause due to the initial mark - although already long at 6+ seconds, this *at least* makes sense (and it is generally not this long). So although not ideal, we can live with it until the bigger issue is resolved.<BR> <BR>[09-08 21:14:57] https-xxxxx (INFO) 16ms spid:295 .....<BR>[There is a six second gap in the logs, constituting a clear pause - matching the duration of the STW initial mark phase]<BR>[09-08 21:15:03] cluster-xxxxx (INFO) ....<BR> <BR>The bigger issue is that, subsequent to this, some mission-critical threads are paused as we get close to
the end of the CM phase - snippets from the application logs:<BR> <BR>09-08 21:15:03 ==> Time stamp corresponding to when concurrent marking phase begins<BR>[09-08 21:16:19] health-check (INFO) .....<BR>[There is a 60 second gap here in this thread - some other threads are executing but there is generally a great reduction in the number of active threads at this point]<BR>09-08 21:17:18 ==> Time stamp corresponding to when concurrent marking phase ends<BR>[09-08 21:17:19] health-check (INFO) .....<BR> <BR>This timing seems to suggest a clear correlation between the long duration CM phase and the pausing of the application threads. [The nature of our application is that a pausing of this thread for such a long time necessitates a controlled restart of the app, which explains you see nothing in the GC logs shortly after this point.]<BR> <BR>Questions:<BR> <BR>1\ is it clear based on the description above that the issue is
identical to 6692906 (<A href="http://bugs.sun.com/view_bug.do?bug_id=6692906">http://bugs.sun.com/view_bug.do?bug_id=6692906</A>)?<BR>2\ will we benefit by upgrading to a more recent JRE [1.6.0_26 being the one under consideration]?<BR>3\ I have seen recommendations to use "-XX:-CMSConcurrentMTEnabled" on some web forums - but I have concerns about this; if we don't allow for concurrent marking to use multiple threads, then isnt there a danger of marking proceeding so slowly that we might end up running out of memory i.e., garbage created much faster than it is collected]?<BR> <BR>Any help is greatly appreciated. Please let me know if any additional information is needed at all. I haven't attached the full GC log (it caused problems with posting) but will gladly send it directly to anybody who would like.<BR><BR>Regards,<BR>Srini.</td></tr></table>