Pauses with the CMS

Y. Srinivas Ramakrishna y.s.ramakrishna at oracle.com
Thu Oct 28 08:04:32 UTC 2010


Hi Morten --

On 10/28/2010 12:40 AM, mrk wrote:
> Hi Ramki
>
> Thanks for your answer. I have read the CR description. It mentions NIO. We do not use the NIO
> connector in Tomcat. Could the CR be affecting us anyway?

It need not be NIO, but it could be something else perhaps not directly used
by you but indirectly through the core JDK libraries that makes use of
JNI Critical Sections (NIO used to use that mechanism until recently
but no longer does).

I think it's worth a try, since the symptoms you describe are very similar to
6919638. Try the workaround or 6u23/JDK 7 latest and, if it doesn't work,
closer investigation will be warranted, at which time you will want to
open a ticket via your support channels and/or provide a test case
exhibiting the problem.

best regards.
-- ramki

>
> med venlig hilsen / Best regards,
> Morten Knudsen
>
> "Y. S. Ramakrishna" <y.s.ramakrishna at oracle.com> skrev i meddelelsen
> news:4CC84F4E.2080302 at oracle.com...
>> There were two related CR's which may be affecting you:
>>
>> (1) 6692906 CMS: parallel concurrent marking may be prone to hanging or stalling mutators for
>> periods of time
>> I do not think this is affecting you.
>>
>> (2) 6919638 CMS: ExplicitGCInvokesConcurrent misinteracts with gc locker
>>
>> I suspect it could be this latter CR that's affecting you. If so, it's
>> fixed in 6u22 and in 6u23 (and in 7 since 7bb95), so you might want to
>> try those. There's also a workaround documented in the CR.
>>
>> -- ramki
>>
>>
>> On 10/27/10 01:50, mrk wrote:
>>> Hi All
>>>
>>> Hi
>>>
>>> We have made a webapp running in JBoss 4.2.3 on RedHat Linux 5.2 on a 12 core IBM machine. Lately
>>> we have seen long response times, which seems to be caused be GC activities. The pattern is more
>>> or less as follows:
>>> All requests seem to come at normal rate, but no bytes are sent until suddenly all responses are
>>> "flushed".
>>> At same time the gc.log has this pattern:
>>>
>>> // Here the unresponsiveness starts
>>> 36788.309: [GC [1 CMS-initial-mark: 1919297K(4456448K)] 1969797K(5111808K), 0.1473460 secs]
>>> [Times: user=0.15 sys=0.00, real=0.15 secs]
>>> Total time for which application threads were stopped: 0.1530790 seconds
>>> 36788.457: [CMS-concurrent-mark-start]
>>> 36793.886: [CMS-concurrent-mark: 5.429/5.429 secs] (CMS-concurrent-mark yielded 46 times)
>>> [Times: user=6.69 sys=0.07, real=5.43 secs]
>>> 36793.886: [CMS-concurrent-preclean-start]
>>> (cardTable: 15956 cards, re-scanned 15956 cards, 1 iterations)
>>> 36793.988: [CMS-concurrent-preclean: 0.089/0.101 secs] (CMS-concurrent-preclean yielded 9 times)
>>> [Times: user=0.12 sys=0.01, real=0.10 secs]
>>> 36793.988: [CMS-concurrent-abortable-preclean-start]
>>> (cardTable: 75 cards, re-scanned 75 cards, 1 iterations)
>>> Application time: 5.6923460 seconds
>>> Total time for which application threads were stopped: 0.0170000 seconds
>>> (cardTable: 96 cards, re-scanned 96 cards, 1 iterations)
>>> (cardTable: 207 cards, re-scanned 207 cards, 1 iterations)
>>> (cardTable: 45 cards, re-scanned 45 cards, 1 iterations)
>>> (cardTable: 157 cards, re-scanned 157 cards, 1 iterations)
>>> (cardTable: 88 cards, re-scanned 88 cards, 1 iterations)
>>> (cardTable: 213 cards, re-scanned 213 cards, 1 iterations)
>>> (cardTable: 84 cards, re-scanned 84 cards, 1 iterations)
>>> (cardTable: 143 cards, re-scanned 143 cards, 1 iterations)
>>> (cardTable: 40 cards, re-scanned 40 cards, 1 iterations)
>>> (cardTable: 109 cards, re-scanned 109 cards, 1 iterations)
>>> (cardTable: 77 cards, re-scanned 77 cards, 1 iterations)
>>> (cardTable: 156 cards, re-scanned 156 cards, 1 iterations)
>>> (cardTable: 132 cards, re-scanned 132 cards, 1 iterations)
>>> (cardTable: 119 cards, re-scanned 119 cards, 1 iterations)
>>> (cardTable: 115 cards, re-scanned 115 cards, 1 iterations)
>>> (cardTable: 195 cards, re-scanned 195 cards, 1 iterations)
>>> (cardTable: 163 cards, re-scanned 163 cards, 1 iterations)
>>> (cardTable: 166 cards, re-scanned 166 cards, 1 iterations)
>>> (cardTable: 149 cards, re-scanned 149 cards, 1 iterations)
>>> (cardTable: 218 cards, re-scanned 218 cards, 1 iterations)
>>> (cardTable: 46 cards, re-scanned 46 cards, 1 iterations)
>>> (cardTable: 57 cards, re-scanned 57 cards, 1 iterations)
>>> (cardTable: 213 cards, re-scanned 213 cards, 1 iterations)
>>> (cardTable: 189 cards, re-scanned 189 cards, 1 iterations)
>>> (cardTable: 104 cards, re-scanned 104 cards, 1 iterations)
>>> (cardTable: 60 cards, re-scanned 60 cards, 1 iterations)
>>> (cardTable: 368 cards, re-scanned 368 cards, 1 iterations)
>>> (cardTable: 118 cards, re-scanned 118 cards, 1 iterations)
>>> (cardTable: 129 cards, re-scanned 129 cards, 1 iterations)
>>> (cardTable: 291 cards, re-scanned 291 cards, 1 iterations)
>>> (cardTable: 105 cards, re-scanned 105 cards, 1 iterations)
>>> (cardTable: 63 cards, re-scanned 63 cards, 1 iterations)
>>> (cardTable: 73 cards, re-scanned 73 cards, 1 iterations)
>>> (cardTable: 165 cards, re-scanned 165 cards, 1 iterations)
>>> (cardTable: 69 cards, re-scanned 69 cards, 1 iterations)
>>> (cardTable: 157 cards, re-scanned 157 cards, 1 iterations)
>>> CMS: abort preclean due to time [37 iterations, 13 waits, 4954 cards)] 36799.161:
>>> [CMS-concurrent-abortable-preclean: 2.965/5.173 secs] (CMS-concurrent-abortable-preclean yielded
>>> 166 times)
>>> [Times: user=5.12 sys=0.10, real=5.17 secs]
>>> Application time: 4.9945030 seconds
>>> 36799.172: [GC[YG occupancy: 572074 K (655360 K)]36799.172: [GC 36799.173: [ParNew
>>> Desired survivor size 67108864 bytes, new threshold 2 (max 15)
>>> - age 1: 59654872 bytes, 59654872 total
>>> - age 2: 26827232 bytes, 86482104 total
>>> - age 3: 2613360 bytes, 89095464 total
>>> - age 4: 929600 bytes, 90025064 total
>>> - age 5: 893176 bytes, 90918240 total
>>> - age 6: 795736 bytes, 91713976 total
>>> - age 7: 809240 bytes, 92523216 total
>>> - age 8: 936216 bytes, 93459432 total
>>> - age 9: 971376 bytes, 94430808 total
>>> - age 10: 624472 bytes, 95055280 total
>>> - age 11: 883616 bytes, 95938896 total
>>> - age 12: 817296 bytes, 96756192 total
>>> - age 13: 977992 bytes, 97734184 total
>>> - age 14: 733816 bytes, 98468000 total
>>> - age 15: 654744 bytes, 99122744 total
>>> : 572074K->110988K(655360K), 0.1855140 secs] 2491372K->2031059K(5111808K) icms_dc=0 , 0.1866380
>>> secs] [Times: user=0.72 sys=0.00, real=0.19 secs]
>>> 36799.359: [Rescan (parallel) (Survivor:12chunks) Finished young gen rescan work in 3th thread:
>>> 0.032 sec
>>> Finished young gen rescan work in 2th thread: 0.035 sec
>>> Finished young gen rescan work in 0th thread: 0.035 sec
>>> Finished young gen rescan work in 1th thread: 0.035 sec
>>> Finished remaining root rescan work in 1th thread: 0.030 sec
>>> Finished remaining root rescan work in 3th thread: 0.033 sec
>>> Finished remaining root rescan work in 2th thread: 0.032 sec
>>> Finished remaining root rescan work in 0th thread: 0.042 sec
>>> Finished dirty card rescan work in 2th thread: 0.029 sec
>>> Finished dirty card rescan work in 3th thread: 0.031 sec
>>> Finished dirty card rescan work in 0th thread: 0.019 sec
>>> Finished dirty card rescan work in 1th thread: 0.031 sec
>>> Finished work stealing in 3th thread: 0.000 sec
>>> Finished work stealing in 0th thread: 0.000 sec
>>> Finished work stealing in 2th thread: 0.000 sec
>>> Finished work stealing in 1th thread: 0.000 sec
>>> , 0.0959640 secs]36799.455: [weak refs processing, 0.0121220 secs] [1 CMS-remark:
>>> 1920070K(4456448K)] 2031059K(5111808K), 0.2952350 secs] [Times: user=1.11 sys=0.00, real=0.30 secs]
>>> Total time for which application threads were stopped: 0.3066430 seconds
>>> 36799.468: [CMS-concurrent-sweep-start]
>>> 36805.824: [CMS-concurrent-sweep: 6.355/6.356 secs] (CMS-concurrent-sweep yielded 515 times)
>>> [Times: user=8.76 sys=0.24, real=6.35 secs]
>>> 36805.824: [CMS-concurrent-reset-start]
>>> 36805.903: [CMS-concurrent-reset: 0.079/0.079 secs] (CMS-concurrent-reset yielded 0 times)
>>> [Times: user=0.64 sys=0.01, real=0.07 secs]
>>> Application time: 6.5590510 seconds
>>> // About here the server starts sending out responses
>>> 36806.044: [GC 36806.045: [ParNew
>>> Desired survivor size 67108864 bytes, new threshold 2 (max 15)
>>> - age 1: 55456640 bytes, 55456640 total
>>> - age 2: 54278984 bytes, 109735624 total
>>> : 635276K->131072K(655360K), 0.3230430 secs] 2554507K->2089990K(5111808K) icms_dc=0 , 0.3242750
>>> secs] [Times: user=1.02 sys=0.02, real=0.33 secs]
>>> Total time for which application threads were stopped: 0.3422430 seconds
>>> Application time: 0.7628480 seconds
>>>
>>> We have tried various GC setups, but we always end up more or less like this.
>>> Our current parameters:
>>> -Xms4096m -Xmx5120m -XX:MaxPermSize=384m
>>> -XX:+UseConcMarkSweepGC -XX:+ExplicitGCInvokesConcurrent
>>> -XX:+PrintClassHistogram -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime
>>> -XX:+PrintGCDetails
>>> -XX:ReservedCodeCacheSize=64m -XX:NewSize=768m -XX:SurvivorRatio=4
>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:+CMSIncrementalMode
>>> -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
>>> -XX:+UseTLAB -XX:ParallelGCThreads=4
>>> -XX:+UseParNewGC -XX:MaxGCPauseMillis=1000 -XX:GCTimeRatio=10 -XX:+CMSPermGenSweepingEnabled
>>> -XX:+PrintCMSInitiationStatistics -XX:PrintCMSStatistics=1 -XX:+PrintGCTaskTimeStamps
>>> -XX:+PrintGCTimeStamps
>>> -verbose:gc -XX:+TraceClassLoading -XX:+TraceClassUnloading
>>> -XX:+PrintTenuringDistribution
>>>
>>> How can we avoid the unresponsiveness?
>>>
>>> We have tried with both JDK 1.6.0_06 and JDK 1.6.0_021.
>>>
>>> It seems that the problem have a lot in common with:
>>> http://old.nabble.com/long-unaccounted-pause-during-concurrent-mark-phase-of-ParNew-cycle-to27459365.html#a27463304
>>> But setting -XX:-CMSConcurrentMTEnabled does not seem to help?
>>>
>>> med venlig hilsen / Best regards,
>>> Morten Knudsen
>>>
>>>
>>>
>>>
>>>
>>
>
>




More information about the hotspot-gc-dev mailing list