long unaccounted pause during concurrent mark phase of ParNew cycle

Y. Srinivas Ramakrishna Y.S.Ramakrishna at Sun.COM
Fri Feb 5 06:05:49 UTC 2010


Hello Atul --

Could you add -XX:-CMSConcurrentMTEnabled to the mix
and see if it affects the behaviour in any manner?

PS: I notice that you are using -XX:-CMSParallelRemarkEnabled.
What was the reason for using that flag?

thanks.
-- ramki

atulksh wrote:
> Hi Jon,
> I have some more (probably more precise) information for you. I have been
> able to reproduce this issue in our test environment (up till now we were
> only seeing it in our production setup). Along with the parameters that I
> have specified above I added the following.
> 
> 
> 
> After loading up the app server (it is Tomcat 6.0.14) I got a pause during
> the CMS cycle of about 32 seconds. The way I noticed the pause is like this.
> I kept on running a script that does some basic page refresh (small data
> page) in the application continously. This is like a ping to the server.
> Then I parallely do operations that load up the memory in the application
> server. These are the real hits that users do in whom memory requirement is
> usually high and that drives the app server to the CMS threshold.
> 
> In my case once the CMS kicked in, I saw that the period where the
> applilcation was in the "pause" no requests of my script were processed.
> They were just queued up to Tomcat - meaning Apache got those requests and
> logged that they were sent to Tomcat. But Tomcat did not register of
> receiving them. Then after the app went out of the "pause" then everything
> came back and all these pending requests got processed - like the app just
> had been unfrozen.
> 
> During this freeze, what I see in the GC log is that the CMS cycle is almost
> at the end of the concurrent mark phase. It is doing the GC is doing the
> "perm space scanning". The pause exactly coincides with the time reported
> for the "perm space scanning". I am attaching the full gc log snippt right
> from the start of the CMS cycle. Below is just the place where the pause
> appears.
> 
> 28676.784: [GC 28676.784: [ParNew
> Desired survivor size 4358144 bytes, new threshold 2 (max 4)
> - age   1:    3742608 bytes,    3742608 total
> - age   2:    3451152 bytes,    7193760 total
> : 76672K->8512K(76672K), 0.2045430 secs] 20262376K->20197311K(27254464K),
> 0.2048030 secs] [Times: user=0.77 sys=0.01, real=0.21 secs]
> Total time for which application threads were stopped: 0.2055080 seconds
> Finished perm space scanning in 2th thread: 0.367 sec
> Finished perm space scanning in 0th thread: 0.362 sec
> Application time: 0.1481860 seconds
> 28708.626: [GC 28708.627: [ParNew
> Desired survivor size 4358144 bytes, new threshold 2 (max 4)
> - age   1:    2079760 bytes,    2079760 total
> - age   2:    3657248 bytes,    5737008 total
> : 76672K->6795K(76672K), 0.1937080 secs] 20265471K->20198965K(27254464K),
> 0.1939480 secs] [Times: user=0.74 sys=0.00, real=0.20 secs]
> Total time for which application threads were stopped: 0.1948820 seconds
> Finished perm space scanning in 3th thread: 32.153 sec
> Finished perm space scanning in 1th thread: 32.097 sec
> Finished work stealing in 1th thread: 0.000 sec
> Finished work stealing in 3th thread: 0.002 sec
> Finished work stealing in 2th thread: 31.791 sec
> Finished work stealing in 0th thread: 31.791 sec
> 28708.825: [CMS-concurrent-mark: 88.906/158.952 secs] (CMS-concurrent-mark
> yielded 454 times)
>  [Times: user=544.33 sys=4.59, real=158.95 secs]
> 
> What is "perm space scanning" and is this where the gc would freeze the
> application (STW type). Is this a bug or something that is expected?
> 
> Please let me know. This has become quite critical for our application.
> Please see the full gc log file attached. Your help very much appreciated.
> 
> Thank you,
> 
> Atul
> 
> 
> Jon Masamitsu wrote:
>> Atul,
>>
>> Do you mean the 36 secs here?
>>
>>  > Thu Feb  4 10:13:35 2010: [CMS-concurrent-mark: 36.272/36.715 secs] 
>> [Times:
>>  > user=56.34 sys=0.34, real=36.71 secs]
>>
>> That is a concurrent phase (i.e., the application threads
>> are still running although they have to share the hardware
>> threads with the concurrent GC threads).
>>
>> Jon
>>
>> On 02/04/10 12:37, atulksh wrote:
>>> Hi,
>>> We have an application that is freezing (pausing) for a very long (tens
>>> of
>>> seconds) time during the concurrent mark phase of the ParNew cycle. This
>>> pause is consistently seen when there is good amount of activity on the
>>> server. We tried to upgrade the JVM (JRE 6) from Update 02 to Update 17
>>> keeping all the parameters same, but the pauses did not go away (they
>>> became
>>> more consistent and slightly longer). We even tried to enable
>>> CompressedOops
>>> option, thinking that that would reduce memory requirements, but that has
>>> not helped.
>>>
>>> We are running on 64bit SUSE 10.1 platform with 32Gb of RAM and 8 core
>>> machine. 26g is allocated to the java process. The full set of VM
>>> parameters
>>> are here
>>>
>>>
>>> -Xmx26g
>>> -Xms26g
>>> -server 
>>> -XX:+UseConcMarkSweepGC
>>> -XX:PermSize=256m
>>> -XX:MaxPermSize=256m
>>> -XX:+PrintTenuringDistribution
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps
>>> -XX:CMSInitiatingOccupancyFraction=70
>>> -XX:-CMSParallelRemarkEnabled
>>>
>>> The nature of the pause is like this. All the application threads that
>>> are
>>> being processed seem to stop responding. New requests that come to our
>>> app
>>> server are not processed and get queued. When the pause ends then
>>> everything
>>> frees up and seems to go on normally henceforth. In the GC you see a gap
>>> between the ParNews. The following are two examples of the pauses. Notice
>>> the pause from 10:13:06 to 10:13:35 (29 seconds) in the first one and
>>> 11:29:04 to 11:29:17 (13 seconds) in the second one. These pauses only
>>> occur
>>> in the concurrent mark phase. They do not seem to occur at any other time
>>> or
>>> anywhere else in the CMS cycle.
>>>
>>> ....
>>> Thu Feb  4 10:12:59 2010: [GC [1 CMS-initial-mark: 18976170K(27092608K)]
>>> 18989786K(27245952K), 0.0059140 secs] [Times: user=0.01 sys=0.00,
>>> real=0.01
>>> secs] 
>>> Thu Feb  4 10:12:59 2010: [CMS-concurrent-mark-start]
>>> Thu Feb  4 10:12:59 2010 [18,980,009 K]: [GC 42035.293: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>> 1833064
>>> bytes,    1833064 total - age   2:      18472 bytes,    1851536 total :
>>> 148315K->3839K(153344K), 0.0400230 secs] 19124486K->18980009K(27245952K),
>>> 0.0401580 secs] [Times: user=0.25 sys=0.00, real=0.04 secs] 
>>> Thu Feb  4 10:12:59 2010 [18,979,036 K]: [GC 42035.415: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>> 2068376
>>> bytes,    2068376 total - age   2:      23792 bytes,    2092168 total -
>>> age  
>>> 3:      18472 bytes,    2110640 total : 140159K->2866K(153344K),
>>> 0.0352200
>>> secs] 19116329K->18979036K(27245952K), 0.0353520 secs] [Times: user=0.29
>>> sys=0.00, real=0.03 secs] 
>>> Thu Feb  4 10:12:59 2010 [18,979,103 K]: [GC 42035.535: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>> 1782608
>>> bytes,    1782608 total - age   2:      22032 bytes,    1804640 total -
>>> age  
>>> 3:      23792 bytes,    1828432 total - age   4:      18472 bytes,   
>>> 1846904 total : 139186K->2933K(153344K), 0.0498480 secs]
>>> 19115356K->18979103K(27245952K), 0.0499850 secs] [Times: user=0.24
>>> sys=0.00,
>>> real=0.05 secs] 
>>> Thu Feb  4 10:12:59 2010 [18,979,181 K]: [GC 42035.673: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>> 2161000
>>> bytes,    2161000 total - age   2:      22032 bytes,    2183032 total -
>>> age  
>>> 3:      22032 bytes,    2205064 total - age   4:      23792 bytes,   
>>> 2228856 total : 139253K->2992K(153344K), 0.0400420 secs]
>>> 19115423K->18979181K(27245952K), 0.0401730 secs] [Times: user=0.26
>>> sys=0.00,
>>> real=0.04 secs] 
>>> Thu Feb  4 10:13:06 2010 [18,979,876 K]: [GC 42042.243: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>> 2113272
>>> bytes,    2113272 total - age   2:      23792 bytes,    2137064 total -
>>> age  
>>> 3:      22032 bytes,    2159096 total - age   4:      22032 bytes,   
>>> 2181128 total : 138855K->3662K(153344K), 0.0465500 secs]
>>> 19115044K->18979876K(27245952K), 0.0466840 secs] [Times: user=0.27
>>> sys=0.00,
>>> real=0.04 secs] 
>>> Thu Feb  4 10:13:35 2010 [18,979,051 K]: [GC 42071.809: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:    
>>> 219056
>>> bytes,     219056 total - age   2:    1867632 bytes,    2086688 total -
>>> age  
>>> 3:      23792 bytes,    2110480 total - age   4:      22032 bytes,   
>>> 2132512 total : 139982K->2814K(153344K), 0.0426920 secs]
>>> 19116196K->18979051K(27245952K), 0.0428200 secs] [Times: user=0.28
>>> sys=0.00,
>>> real=0.05 secs] 
>>> Thu Feb  4 10:13:35 2010: [CMS-concurrent-mark: 36.272/36.715 secs]
>>> [Times:
>>> user=56.34 sys=0.34, real=36.71 secs] 
>>> ....
>>>
>>>
>>> ....
>>> Thu Feb  4 11:28:59 2010: [GC [1 CMS-initial-mark: 19012252K(27092608K)]
>>> 19032719K(27245952K), 0.0279440 secs] [Times: user=0.02 sys=0.00,
>>> real=0.03
>>> secs] 
>>> Thu Feb  4 11:28:59 2010: [CMS-concurrent-mark-start]
>>> Thu Feb  4 11:28:59 2010 [19,154,661 K]: [GC 46595.614: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>> 17431976
>>> bytes,   17431976 total : 153344K->17024K(153344K), 0.9259390 secs]
>>> 19165596K->19154661K(27245952K), 0.9260890 secs] [Times: user=1.23
>>> sys=0.05,
>>> real=0.93 secs] 
>>> Thu Feb  4 11:29:00 2010 [19,199,696 K]: [GC 46596.973: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>> 17448432
>>> bytes,   17448432 total : 153344K->17024K(153344K), 0.3054460 secs]
>>> 19290981K->19199696K(27245952K), 0.3055930 secs] [Times: user=0.79
>>> sys=0.07,
>>> real=0.31 secs] 
>>> Thu Feb  4 11:29:01 2010 [19,229,358 K]: [GC 46597.722: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>> 10331408
>>> bytes,   10331408 total : 153344K->17024K(153344K), 0.1449040 secs]
>>> 19336016K->19229358K(27245952K), 0.1450630 secs] [Times: user=0.59
>>> sys=0.02,
>>> real=0.15 secs] 
>>> Thu Feb  4 11:29:02 2010 [19,240,015 K]: [GC 46598.038: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>> 17354120
>>> bytes,   17354120 total : 153344K->17024K(153344K), 0.0935440 secs]
>>> 19365678K->19240015K(27245952K), 0.0936870 secs] [Times: user=0.59
>>> sys=0.00,
>>> real=0.10 secs] 
>>> Thu Feb  4 11:29:02 2010 [19,302,509 K]: [GC 46598.347: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>> 17432360
>>> bytes,   17432360 total : 151519K->17024K(153344K), 0.4619430 secs]
>>> 19374510K->19302509K(27245952K), 0.4620890 secs] [Times: user=1.19
>>> sys=0.02,
>>> real=0.46 secs] 
>>> Thu Feb  4 11:29:03 2010 [19,329,594 K]: [GC 46599.030: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>> 11933296
>>> bytes,   11933296 total : 153344K->17024K(153344K), 0.1492670 secs]
>>> 19438829K->19329594K(27245952K), 0.1494200 secs] [Times: user=0.77
>>> sys=0.03,
>>> real=0.15 secs] 
>>> Thu Feb  4 11:29:03 2010 [19,337,176 K]: [GC 46599.423: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>> 5719032
>>> bytes,    5719032 total : 153344K->12852K(153344K), 0.0675180 secs]
>>> 19465914K->19337176K(27245952K), 0.0676670 secs] [Times: user=0.46
>>> sys=0.02,
>>> real=0.07 secs] 
>>> Thu Feb  4 11:29:03 2010 [19,338,256 K]: [GC 46599.754: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 2 (max 4) - age   1:   
>>> 6783616
>>> bytes,    6783616 total - age   2:    5570080 bytes,   12353696 total :
>>> 149172K->13931K(153344K), 0.0650520 secs]
>>> 19473496K->19338256K(27245952K),
>>> 0.0652060 secs] [Times: user=0.42 sys=0.01, real=0.07 secs] 
>>> Thu Feb  4 11:29:04 2010 [19,340,264 K]: [GC 46600.144: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 2 (max 4) - age   1:   
>>> 2190136
>>> bytes,    2190136 total - age   2:    6762648 bytes,    8952784 total :
>>> 150251K->10375K(153344K), 0.0709470 secs]
>>> 19474576K->19340264K(27245952K),
>>> 0.0710920 secs] [Times: user=0.48 sys=0.01, real=0.07 secs] 
>>> Thu Feb  4 11:29:17 2010 [19,345,928 K]: [GC 46613.628: [ParNew Desired
>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:    
>>> 760488
>>> bytes,     760488 total - age   2:    2173472 bytes,    2933960 total :
>>> 146695K->9279K(153344K), 0.0689390 secs] 19476584K->19345928K(27245952K),
>>> 0.0691070 secs] [Times: user=0.45 sys=0.00, real=0.07 secs] 
>>> Thu Feb  4 11:29:17 2010: [CMS-concurrent-mark: 15.853/18.260 secs]
>>> [Times:
>>> user=29.26 sys=0.55, real=18.26 secs] 
>>> Thu Feb  4 11:29:17 2010: [CMS-concurrent-preclean-start]
>>> ....
>>>
>>> Any idea what may causing these pauses and what possible JVM tuning would
>>> be
>>> neccessary to avoid these pauses? Your help greatly appreciated.
>>>
>>> Thank you very much.
>>>
>>> Atul
>>
> http://old.nabble.com/file/p27463304/long.pause.in.cms.gc.log
> long.pause.in.cms.gc.log 




More information about the hotspot-gc-dev mailing list