long unaccounted pause during concurrent mark phase of ParNew cycle

atulksh atulksh at hotmail.com
Fri Feb 5 17:13:24 UTC 2010


Hi Jon,
Also to give a bit more information, we have been using Update 2 on our
servers since more than 2 years now. In that period we upgraded our servers
from 8Gb to 32Gb (current).

Thank you,

Atul


atulksh wrote:
> 
> I see. Sorry for not getting your previous question. Actually before
> Update 02 we did not see this issue. But that was a very different
> situation altogether since our servers did not have that much memory
> allocated to them. Only recently we have seen this issue come to surface
> because perhaps it has now become very noticeable.
> 
> One correlation that we have found is that the pauses occur when there is
> significant load on the system (in terms of user activity - not
> necessarily user hits). When the system is relatively less uses the pauses
> are also relatively less.
> 
> Thank you,
> 
> Atul
> 
> 
> Jon Masamitsu wrote:
>> 
>> 
>> 
>> On 02/05/10 08:47, atulksh wrote:
>>> Hi Jon,
>>> We are currently still using Update 2 on our servers. We see the same
>>> kind
>>> of pause behavior on Update 2 as well as Update 17. The pause behavior
>>> does
>>> not seem to change between these two versions.
>> 
>> Did you notice this behavior after upgrading to jdk6 update 2 (i.e.,
>> did it work OK on a jdk5 release or earlier)?
>> 
>>> 
>>> Thank you,
>>> 
>>> Atul
>>> 
>>> 
>>> Jon Masamitsu wrote:
>>>> Atul,
>>>>
>>>> You mentioned that you were on jdk6 update 2 initially.
>>>> Did you see this behavior with earlier releases?
>>>>
>>>> Jon
>>>>
>>>> On 02/04/10 20:02, atulksh wrote:
>>>>> Hi Jon,
>>>>> I have some more (probably more precise) information for you. I have
>>>>> been
>>>>> able to reproduce this issue in our test environment (up till now we
>>>>> were
>>>>> only seeing it in our production setup). Along with the parameters
>>>>> that I
>>>>> have specified above I added the following.
>>>>>
>>>>>
>>>>>
>>>>> After loading up the app server (it is Tomcat 6.0.14) I got a pause
>>>>> during
>>>>> the CMS cycle of about 32 seconds. The way I noticed the pause is like
>>>>> this.
>>>>> I kept on running a script that does some basic page refresh (small
>>>>> data
>>>>> page) in the application continously. This is like a ping to the
>>>>> server.
>>>>> Then I parallely do operations that load up the memory in the
>>>>> application
>>>>> server. These are the real hits that users do in whom memory
>>>>> requirement
>>>>> is
>>>>> usually high and that drives the app server to the CMS threshold.
>>>>>
>>>>> In my case once the CMS kicked in, I saw that the period where the
>>>>> applilcation was in the "pause" no requests of my script were
>>>>> processed.
>>>>> They were just queued up to Tomcat - meaning Apache got those requests
>>>>> and
>>>>> logged that they were sent to Tomcat. But Tomcat did not register of
>>>>> receiving them. Then after the app went out of the "pause" then
>>>>> everything
>>>>> came back and all these pending requests got processed - like the app
>>>>> just
>>>>> had been unfrozen.
>>>>>
>>>>> During this freeze, what I see in the GC log is that the CMS cycle is
>>>>> almost
>>>>> at the end of the concurrent mark phase. It is doing the GC is doing
>>>>> the
>>>>> "perm space scanning". The pause exactly coincides with the time
>>>>> reported
>>>>> for the "perm space scanning". I am attaching the full gc log snippt
>>>>> right
>>>>> from the start of the CMS cycle. Below is just the place where the
>>>>> pause
>>>>> appears.
>>>>>
>>>>> 28676.784: [GC 28676.784: [ParNew
>>>>> Desired survivor size 4358144 bytes, new threshold 2 (max 4)
>>>>> - age   1:    3742608 bytes,    3742608 total
>>>>> - age   2:    3451152 bytes,    7193760 total
>>>>> : 76672K->8512K(76672K), 0.2045430 secs]
>>>>> 20262376K->20197311K(27254464K),
>>>>> 0.2048030 secs] [Times: user=0.77 sys=0.01, real=0.21 secs]
>>>>> Total time for which application threads were stopped: 0.2055080
>>>>> seconds
>>>>> Finished perm space scanning in 2th thread: 0.367 sec
>>>>> Finished perm space scanning in 0th thread: 0.362 sec
>>>>> Application time: 0.1481860 seconds
>>>>> 28708.626: [GC 28708.627: [ParNew
>>>>> Desired survivor size 4358144 bytes, new threshold 2 (max 4)
>>>>> - age   1:    2079760 bytes,    2079760 total
>>>>> - age   2:    3657248 bytes,    5737008 total
>>>>> : 76672K->6795K(76672K), 0.1937080 secs]
>>>>> 20265471K->20198965K(27254464K),
>>>>> 0.1939480 secs] [Times: user=0.74 sys=0.00, real=0.20 secs]
>>>>> Total time for which application threads were stopped: 0.1948820
>>>>> seconds
>>>>> Finished perm space scanning in 3th thread: 32.153 sec
>>>>> Finished perm space scanning in 1th thread: 32.097 sec
>>>>> Finished work stealing in 1th thread: 0.000 sec
>>>>> Finished work stealing in 3th thread: 0.002 sec
>>>>> Finished work stealing in 2th thread: 31.791 sec
>>>>> Finished work stealing in 0th thread: 31.791 sec
>>>>> 28708.825: [CMS-concurrent-mark: 88.906/158.952 secs]
>>>>> (CMS-concurrent-mark
>>>>> yielded 454 times)
>>>>>  [Times: user=544.33 sys=4.59, real=158.95 secs]
>>>>>
>>>>> What is "perm space scanning" and is this where the gc would freeze
>>>>> the
>>>>> application (STW type). Is this a bug or something that is expected?
>>>>>
>>>>> Please let me know. This has become quite critical for our
>>>>> application.
>>>>> Please see the full gc log file attached. Your help very much
>>>>> appreciated.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Atul
>>>>>
>>>>>
>>>>> Jon Masamitsu wrote:
>>>>>> Atul,
>>>>>>
>>>>>> Do you mean the 36 secs here?
>>>>>>
>>>>>>  > Thu Feb  4 10:13:35 2010: [CMS-concurrent-mark: 36.272/36.715
>>>>>> secs] 
>>>>>> [Times:
>>>>>>  > user=56.34 sys=0.34, real=36.71 secs]
>>>>>>
>>>>>> That is a concurrent phase (i.e., the application threads
>>>>>> are still running although they have to share the hardware
>>>>>> threads with the concurrent GC threads).
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>> On 02/04/10 12:37, atulksh wrote:
>>>>>>> Hi,
>>>>>>> We have an application that is freezing (pausing) for a very long
>>>>>>> (tens
>>>>>>> of
>>>>>>> seconds) time during the concurrent mark phase of the ParNew cycle.
>>>>>>> This
>>>>>>> pause is consistently seen when there is good amount of activity on
>>>>>>> the
>>>>>>> server. We tried to upgrade the JVM (JRE 6) from Update 02 to Update
>>>>>>> 17
>>>>>>> keeping all the parameters same, but the pauses did not go away
>>>>>>> (they
>>>>>>> became
>>>>>>> more consistent and slightly longer). We even tried to enable
>>>>>>> CompressedOops
>>>>>>> option, thinking that that would reduce memory requirements, but
>>>>>>> that
>>>>>>> has
>>>>>>> not helped.
>>>>>>>
>>>>>>> We are running on 64bit SUSE 10.1 platform with 32Gb of RAM and 8
>>>>>>> core
>>>>>>> machine. 26g is allocated to the java process. The full set of VM
>>>>>>> parameters
>>>>>>> are here
>>>>>>>
>>>>>>>
>>>>>>> -Xmx26g
>>>>>>> -Xms26g
>>>>>>> -server 
>>>>>>> -XX:+UseConcMarkSweepGC
>>>>>>> -XX:PermSize=256m
>>>>>>> -XX:MaxPermSize=256m
>>>>>>> -XX:+PrintTenuringDistribution
>>>>>>> -XX:+PrintGCDetails
>>>>>>> -XX:+PrintGCTimeStamps
>>>>>>> -XX:CMSInitiatingOccupancyFraction=70
>>>>>>> -XX:-CMSParallelRemarkEnabled
>>>>>>>
>>>>>>> The nature of the pause is like this. All the application threads
>>>>>>> that
>>>>>>> are
>>>>>>> being processed seem to stop responding. New requests that come to
>>>>>>> our
>>>>>>> app
>>>>>>> server are not processed and get queued. When the pause ends then
>>>>>>> everything
>>>>>>> frees up and seems to go on normally henceforth. In the GC you see a
>>>>>>> gap
>>>>>>> between the ParNews. The following are two examples of the pauses.
>>>>>>> Notice
>>>>>>> the pause from 10:13:06 to 10:13:35 (29 seconds) in the first one
>>>>>>> and
>>>>>>> 11:29:04 to 11:29:17 (13 seconds) in the second one. These pauses
>>>>>>> only
>>>>>>> occur
>>>>>>> in the concurrent mark phase. They do not seem to occur at any other
>>>>>>> time
>>>>>>> or
>>>>>>> anywhere else in the CMS cycle.
>>>>>>>
>>>>>>> ....
>>>>>>> Thu Feb  4 10:12:59 2010: [GC [1 CMS-initial-mark:
>>>>>>> 18976170K(27092608K)]
>>>>>>> 18989786K(27245952K), 0.0059140 secs] [Times: user=0.01 sys=0.00,
>>>>>>> real=0.01
>>>>>>> secs] 
>>>>>>> Thu Feb  4 10:12:59 2010: [CMS-concurrent-mark-start]
>>>>>>> Thu Feb  4 10:12:59 2010 [18,980,009 K]: [GC 42035.293: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>>>>>> 1833064
>>>>>>> bytes,    1833064 total - age   2:      18472 bytes,    1851536
>>>>>>> total :
>>>>>>> 148315K->3839K(153344K), 0.0400230 secs]
>>>>>>> 19124486K->18980009K(27245952K),
>>>>>>> 0.0401580 secs] [Times: user=0.25 sys=0.00, real=0.04 secs] 
>>>>>>> Thu Feb  4 10:12:59 2010 [18,979,036 K]: [GC 42035.415: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>>>>>> 2068376
>>>>>>> bytes,    2068376 total - age   2:      23792 bytes,    2092168
>>>>>>> total -
>>>>>>> age  
>>>>>>> 3:      18472 bytes,    2110640 total : 140159K->2866K(153344K),
>>>>>>> 0.0352200
>>>>>>> secs] 19116329K->18979036K(27245952K), 0.0353520 secs] [Times:
>>>>>>> user=0.29
>>>>>>> sys=0.00, real=0.03 secs] 
>>>>>>> Thu Feb  4 10:12:59 2010 [18,979,103 K]: [GC 42035.535: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>>>>>> 1782608
>>>>>>> bytes,    1782608 total - age   2:      22032 bytes,    1804640
>>>>>>> total -
>>>>>>> age  
>>>>>>> 3:      23792 bytes,    1828432 total - age   4:      18472 bytes,   
>>>>>>> 1846904 total : 139186K->2933K(153344K), 0.0498480 secs]
>>>>>>> 19115356K->18979103K(27245952K), 0.0499850 secs] [Times: user=0.24
>>>>>>> sys=0.00,
>>>>>>> real=0.05 secs] 
>>>>>>> Thu Feb  4 10:12:59 2010 [18,979,181 K]: [GC 42035.673: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>>>>>> 2161000
>>>>>>> bytes,    2161000 total - age   2:      22032 bytes,    2183032
>>>>>>> total -
>>>>>>> age  
>>>>>>> 3:      22032 bytes,    2205064 total - age   4:      23792 bytes,   
>>>>>>> 2228856 total : 139253K->2992K(153344K), 0.0400420 secs]
>>>>>>> 19115423K->18979181K(27245952K), 0.0401730 secs] [Times: user=0.26
>>>>>>> sys=0.00,
>>>>>>> real=0.04 secs] 
>>>>>>> Thu Feb  4 10:13:06 2010 [18,979,876 K]: [GC 42042.243: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>>>>>> 2113272
>>>>>>> bytes,    2113272 total - age   2:      23792 bytes,    2137064
>>>>>>> total -
>>>>>>> age  
>>>>>>> 3:      22032 bytes,    2159096 total - age   4:      22032 bytes,   
>>>>>>> 2181128 total : 138855K->3662K(153344K), 0.0465500 secs]
>>>>>>> 19115044K->18979876K(27245952K), 0.0466840 secs] [Times: user=0.27
>>>>>>> sys=0.00,
>>>>>>> real=0.04 secs] 
>>>>>>> Thu Feb  4 10:13:35 2010 [18,979,051 K]: [GC 42071.809: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:    
>>>>>>> 219056
>>>>>>> bytes,     219056 total - age   2:    1867632 bytes,    2086688
>>>>>>> total -
>>>>>>> age  
>>>>>>> 3:      23792 bytes,    2110480 total - age   4:      22032 bytes,   
>>>>>>> 2132512 total : 139982K->2814K(153344K), 0.0426920 secs]
>>>>>>> 19116196K->18979051K(27245952K), 0.0428200 secs] [Times: user=0.28
>>>>>>> sys=0.00,
>>>>>>> real=0.05 secs] 
>>>>>>> Thu Feb  4 10:13:35 2010: [CMS-concurrent-mark: 36.272/36.715 secs]
>>>>>>> [Times:
>>>>>>> user=56.34 sys=0.34, real=36.71 secs] 
>>>>>>> ....
>>>>>>>
>>>>>>>
>>>>>>> ....
>>>>>>> Thu Feb  4 11:28:59 2010: [GC [1 CMS-initial-mark:
>>>>>>> 19012252K(27092608K)]
>>>>>>> 19032719K(27245952K), 0.0279440 secs] [Times: user=0.02 sys=0.00,
>>>>>>> real=0.03
>>>>>>> secs] 
>>>>>>> Thu Feb  4 11:28:59 2010: [CMS-concurrent-mark-start]
>>>>>>> Thu Feb  4 11:28:59 2010 [19,154,661 K]: [GC 46595.614: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>>>>>> 17431976
>>>>>>> bytes,   17431976 total : 153344K->17024K(153344K), 0.9259390 secs]
>>>>>>> 19165596K->19154661K(27245952K), 0.9260890 secs] [Times: user=1.23
>>>>>>> sys=0.05,
>>>>>>> real=0.93 secs] 
>>>>>>> Thu Feb  4 11:29:00 2010 [19,199,696 K]: [GC 46596.973: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>>>>>> 17448432
>>>>>>> bytes,   17448432 total : 153344K->17024K(153344K), 0.3054460 secs]
>>>>>>> 19290981K->19199696K(27245952K), 0.3055930 secs] [Times: user=0.79
>>>>>>> sys=0.07,
>>>>>>> real=0.31 secs] 
>>>>>>> Thu Feb  4 11:29:01 2010 [19,229,358 K]: [GC 46597.722: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>>>>>> 10331408
>>>>>>> bytes,   10331408 total : 153344K->17024K(153344K), 0.1449040 secs]
>>>>>>> 19336016K->19229358K(27245952K), 0.1450630 secs] [Times: user=0.59
>>>>>>> sys=0.02,
>>>>>>> real=0.15 secs] 
>>>>>>> Thu Feb  4 11:29:02 2010 [19,240,015 K]: [GC 46598.038: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>>>>>> 17354120
>>>>>>> bytes,   17354120 total : 153344K->17024K(153344K), 0.0935440 secs]
>>>>>>> 19365678K->19240015K(27245952K), 0.0936870 secs] [Times: user=0.59
>>>>>>> sys=0.00,
>>>>>>> real=0.10 secs] 
>>>>>>> Thu Feb  4 11:29:02 2010 [19,302,509 K]: [GC 46598.347: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>>>>>> 17432360
>>>>>>> bytes,   17432360 total : 151519K->17024K(153344K), 0.4619430 secs]
>>>>>>> 19374510K->19302509K(27245952K), 0.4620890 secs] [Times: user=1.19
>>>>>>> sys=0.02,
>>>>>>> real=0.46 secs] 
>>>>>>> Thu Feb  4 11:29:03 2010 [19,329,594 K]: [GC 46599.030: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age   1:  
>>>>>>> 11933296
>>>>>>> bytes,   11933296 total : 153344K->17024K(153344K), 0.1492670 secs]
>>>>>>> 19438829K->19329594K(27245952K), 0.1494200 secs] [Times: user=0.77
>>>>>>> sys=0.03,
>>>>>>> real=0.15 secs] 
>>>>>>> Thu Feb  4 11:29:03 2010 [19,337,176 K]: [GC 46599.423: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:   
>>>>>>> 5719032
>>>>>>> bytes,    5719032 total : 153344K->12852K(153344K), 0.0675180 secs]
>>>>>>> 19465914K->19337176K(27245952K), 0.0676670 secs] [Times: user=0.46
>>>>>>> sys=0.02,
>>>>>>> real=0.07 secs] 
>>>>>>> Thu Feb  4 11:29:03 2010 [19,338,256 K]: [GC 46599.754: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 2 (max 4) - age   1:   
>>>>>>> 6783616
>>>>>>> bytes,    6783616 total - age   2:    5570080 bytes,   12353696
>>>>>>> total :
>>>>>>> 149172K->13931K(153344K), 0.0650520 secs]
>>>>>>> 19473496K->19338256K(27245952K),
>>>>>>> 0.0652060 secs] [Times: user=0.42 sys=0.01, real=0.07 secs] 
>>>>>>> Thu Feb  4 11:29:04 2010 [19,340,264 K]: [GC 46600.144: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 2 (max 4) - age   1:   
>>>>>>> 2190136
>>>>>>> bytes,    2190136 total - age   2:    6762648 bytes,    8952784
>>>>>>> total :
>>>>>>> 150251K->10375K(153344K), 0.0709470 secs]
>>>>>>> 19474576K->19340264K(27245952K),
>>>>>>> 0.0710920 secs] [Times: user=0.48 sys=0.01, real=0.07 secs] 
>>>>>>> Thu Feb  4 11:29:17 2010 [19,345,928 K]: [GC 46613.628: [ParNew
>>>>>>> Desired
>>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age   1:    
>>>>>>> 760488
>>>>>>> bytes,     760488 total - age   2:    2173472 bytes,    2933960
>>>>>>> total :
>>>>>>> 146695K->9279K(153344K), 0.0689390 secs]
>>>>>>> 19476584K->19345928K(27245952K),
>>>>>>> 0.0691070 secs] [Times: user=0.45 sys=0.00, real=0.07 secs] 
>>>>>>> Thu Feb  4 11:29:17 2010: [CMS-concurrent-mark: 15.853/18.260 secs]
>>>>>>> [Times:
>>>>>>> user=29.26 sys=0.55, real=18.26 secs] 
>>>>>>> Thu Feb  4 11:29:17 2010: [CMS-concurrent-preclean-start]
>>>>>>> ....
>>>>>>>
>>>>>>> Any idea what may causing these pauses and what possible JVM tuning
>>>>>>> would
>>>>>>> be
>>>>>>> neccessary to avoid these pauses? Your help greatly appreciated.
>>>>>>>
>>>>>>> Thank you very much.
>>>>>>>
>>>>>>> Atul
>>>>> http://old.nabble.com/file/p27463304/long.pause.in.cms.gc.log
>>>>> long.pause.in.cms.gc.log 
>>>>
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/long-unaccounted-pause-during-concurrent-mark-phase-of-ParNew-cycle-tp27459365p27470979.html
Sent from the OpenJDK Hotspot Garbage Collection mailing list archive at Nabble.com.




More information about the hotspot-gc-dev mailing list