long unaccounted pause during concurrent mark phase of ParNew cycle
Jon Masamitsu
Jon.Masamitsu at Sun.COM
Fri Feb 5 17:00:17 UTC 2010
On 02/05/10 08:47, atulksh wrote:
> Hi Jon,
> We are currently still using Update 2 on our servers. We see the same kind
> of pause behavior on Update 2 as well as Update 17. The pause behavior does
> not seem to change between these two versions.
Did you notice this behavior after upgrading to jdk6 update 2 (i.e.,
did it work OK on a jdk5 release or earlier)?
>
> Thank you,
>
> Atul
>
>
> Jon Masamitsu wrote:
>> Atul,
>>
>> You mentioned that you were on jdk6 update 2 initially.
>> Did you see this behavior with earlier releases?
>>
>> Jon
>>
>> On 02/04/10 20:02, atulksh wrote:
>>> Hi Jon,
>>> I have some more (probably more precise) information for you. I have been
>>> able to reproduce this issue in our test environment (up till now we were
>>> only seeing it in our production setup). Along with the parameters that I
>>> have specified above I added the following.
>>>
>>>
>>>
>>> After loading up the app server (it is Tomcat 6.0.14) I got a pause
>>> during
>>> the CMS cycle of about 32 seconds. The way I noticed the pause is like
>>> this.
>>> I kept on running a script that does some basic page refresh (small data
>>> page) in the application continously. This is like a ping to the server.
>>> Then I parallely do operations that load up the memory in the application
>>> server. These are the real hits that users do in whom memory requirement
>>> is
>>> usually high and that drives the app server to the CMS threshold.
>>>
>>> In my case once the CMS kicked in, I saw that the period where the
>>> applilcation was in the "pause" no requests of my script were processed.
>>> They were just queued up to Tomcat - meaning Apache got those requests
>>> and
>>> logged that they were sent to Tomcat. But Tomcat did not register of
>>> receiving them. Then after the app went out of the "pause" then
>>> everything
>>> came back and all these pending requests got processed - like the app
>>> just
>>> had been unfrozen.
>>>
>>> During this freeze, what I see in the GC log is that the CMS cycle is
>>> almost
>>> at the end of the concurrent mark phase. It is doing the GC is doing the
>>> "perm space scanning". The pause exactly coincides with the time reported
>>> for the "perm space scanning". I am attaching the full gc log snippt
>>> right
>>> from the start of the CMS cycle. Below is just the place where the pause
>>> appears.
>>>
>>> 28676.784: [GC 28676.784: [ParNew
>>> Desired survivor size 4358144 bytes, new threshold 2 (max 4)
>>> - age 1: 3742608 bytes, 3742608 total
>>> - age 2: 3451152 bytes, 7193760 total
>>> : 76672K->8512K(76672K), 0.2045430 secs] 20262376K->20197311K(27254464K),
>>> 0.2048030 secs] [Times: user=0.77 sys=0.01, real=0.21 secs]
>>> Total time for which application threads were stopped: 0.2055080 seconds
>>> Finished perm space scanning in 2th thread: 0.367 sec
>>> Finished perm space scanning in 0th thread: 0.362 sec
>>> Application time: 0.1481860 seconds
>>> 28708.626: [GC 28708.627: [ParNew
>>> Desired survivor size 4358144 bytes, new threshold 2 (max 4)
>>> - age 1: 2079760 bytes, 2079760 total
>>> - age 2: 3657248 bytes, 5737008 total
>>> : 76672K->6795K(76672K), 0.1937080 secs] 20265471K->20198965K(27254464K),
>>> 0.1939480 secs] [Times: user=0.74 sys=0.00, real=0.20 secs]
>>> Total time for which application threads were stopped: 0.1948820 seconds
>>> Finished perm space scanning in 3th thread: 32.153 sec
>>> Finished perm space scanning in 1th thread: 32.097 sec
>>> Finished work stealing in 1th thread: 0.000 sec
>>> Finished work stealing in 3th thread: 0.002 sec
>>> Finished work stealing in 2th thread: 31.791 sec
>>> Finished work stealing in 0th thread: 31.791 sec
>>> 28708.825: [CMS-concurrent-mark: 88.906/158.952 secs]
>>> (CMS-concurrent-mark
>>> yielded 454 times)
>>> [Times: user=544.33 sys=4.59, real=158.95 secs]
>>>
>>> What is "perm space scanning" and is this where the gc would freeze the
>>> application (STW type). Is this a bug or something that is expected?
>>>
>>> Please let me know. This has become quite critical for our application.
>>> Please see the full gc log file attached. Your help very much
>>> appreciated.
>>>
>>> Thank you,
>>>
>>> Atul
>>>
>>>
>>> Jon Masamitsu wrote:
>>>> Atul,
>>>>
>>>> Do you mean the 36 secs here?
>>>>
>>>> > Thu Feb 4 10:13:35 2010: [CMS-concurrent-mark: 36.272/36.715 secs]
>>>> [Times:
>>>> > user=56.34 sys=0.34, real=36.71 secs]
>>>>
>>>> That is a concurrent phase (i.e., the application threads
>>>> are still running although they have to share the hardware
>>>> threads with the concurrent GC threads).
>>>>
>>>> Jon
>>>>
>>>> On 02/04/10 12:37, atulksh wrote:
>>>>> Hi,
>>>>> We have an application that is freezing (pausing) for a very long (tens
>>>>> of
>>>>> seconds) time during the concurrent mark phase of the ParNew cycle.
>>>>> This
>>>>> pause is consistently seen when there is good amount of activity on the
>>>>> server. We tried to upgrade the JVM (JRE 6) from Update 02 to Update 17
>>>>> keeping all the parameters same, but the pauses did not go away (they
>>>>> became
>>>>> more consistent and slightly longer). We even tried to enable
>>>>> CompressedOops
>>>>> option, thinking that that would reduce memory requirements, but that
>>>>> has
>>>>> not helped.
>>>>>
>>>>> We are running on 64bit SUSE 10.1 platform with 32Gb of RAM and 8 core
>>>>> machine. 26g is allocated to the java process. The full set of VM
>>>>> parameters
>>>>> are here
>>>>>
>>>>>
>>>>> -Xmx26g
>>>>> -Xms26g
>>>>> -server
>>>>> -XX:+UseConcMarkSweepGC
>>>>> -XX:PermSize=256m
>>>>> -XX:MaxPermSize=256m
>>>>> -XX:+PrintTenuringDistribution
>>>>> -XX:+PrintGCDetails
>>>>> -XX:+PrintGCTimeStamps
>>>>> -XX:CMSInitiatingOccupancyFraction=70
>>>>> -XX:-CMSParallelRemarkEnabled
>>>>>
>>>>> The nature of the pause is like this. All the application threads that
>>>>> are
>>>>> being processed seem to stop responding. New requests that come to our
>>>>> app
>>>>> server are not processed and get queued. When the pause ends then
>>>>> everything
>>>>> frees up and seems to go on normally henceforth. In the GC you see a
>>>>> gap
>>>>> between the ParNews. The following are two examples of the pauses.
>>>>> Notice
>>>>> the pause from 10:13:06 to 10:13:35 (29 seconds) in the first one and
>>>>> 11:29:04 to 11:29:17 (13 seconds) in the second one. These pauses only
>>>>> occur
>>>>> in the concurrent mark phase. They do not seem to occur at any other
>>>>> time
>>>>> or
>>>>> anywhere else in the CMS cycle.
>>>>>
>>>>> ....
>>>>> Thu Feb 4 10:12:59 2010: [GC [1 CMS-initial-mark:
>>>>> 18976170K(27092608K)]
>>>>> 18989786K(27245952K), 0.0059140 secs] [Times: user=0.01 sys=0.00,
>>>>> real=0.01
>>>>> secs]
>>>>> Thu Feb 4 10:12:59 2010: [CMS-concurrent-mark-start]
>>>>> Thu Feb 4 10:12:59 2010 [18,980,009 K]: [GC 42035.293: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>> 1833064
>>>>> bytes, 1833064 total - age 2: 18472 bytes, 1851536 total :
>>>>> 148315K->3839K(153344K), 0.0400230 secs]
>>>>> 19124486K->18980009K(27245952K),
>>>>> 0.0401580 secs] [Times: user=0.25 sys=0.00, real=0.04 secs]
>>>>> Thu Feb 4 10:12:59 2010 [18,979,036 K]: [GC 42035.415: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>> 2068376
>>>>> bytes, 2068376 total - age 2: 23792 bytes, 2092168 total -
>>>>> age
>>>>> 3: 18472 bytes, 2110640 total : 140159K->2866K(153344K),
>>>>> 0.0352200
>>>>> secs] 19116329K->18979036K(27245952K), 0.0353520 secs] [Times:
>>>>> user=0.29
>>>>> sys=0.00, real=0.03 secs]
>>>>> Thu Feb 4 10:12:59 2010 [18,979,103 K]: [GC 42035.535: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>> 1782608
>>>>> bytes, 1782608 total - age 2: 22032 bytes, 1804640 total -
>>>>> age
>>>>> 3: 23792 bytes, 1828432 total - age 4: 18472 bytes,
>>>>> 1846904 total : 139186K->2933K(153344K), 0.0498480 secs]
>>>>> 19115356K->18979103K(27245952K), 0.0499850 secs] [Times: user=0.24
>>>>> sys=0.00,
>>>>> real=0.05 secs]
>>>>> Thu Feb 4 10:12:59 2010 [18,979,181 K]: [GC 42035.673: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>> 2161000
>>>>> bytes, 2161000 total - age 2: 22032 bytes, 2183032 total -
>>>>> age
>>>>> 3: 22032 bytes, 2205064 total - age 4: 23792 bytes,
>>>>> 2228856 total : 139253K->2992K(153344K), 0.0400420 secs]
>>>>> 19115423K->18979181K(27245952K), 0.0401730 secs] [Times: user=0.26
>>>>> sys=0.00,
>>>>> real=0.04 secs]
>>>>> Thu Feb 4 10:13:06 2010 [18,979,876 K]: [GC 42042.243: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>> 2113272
>>>>> bytes, 2113272 total - age 2: 23792 bytes, 2137064 total -
>>>>> age
>>>>> 3: 22032 bytes, 2159096 total - age 4: 22032 bytes,
>>>>> 2181128 total : 138855K->3662K(153344K), 0.0465500 secs]
>>>>> 19115044K->18979876K(27245952K), 0.0466840 secs] [Times: user=0.27
>>>>> sys=0.00,
>>>>> real=0.04 secs]
>>>>> Thu Feb 4 10:13:35 2010 [18,979,051 K]: [GC 42071.809: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>> 219056
>>>>> bytes, 219056 total - age 2: 1867632 bytes, 2086688 total -
>>>>> age
>>>>> 3: 23792 bytes, 2110480 total - age 4: 22032 bytes,
>>>>> 2132512 total : 139982K->2814K(153344K), 0.0426920 secs]
>>>>> 19116196K->18979051K(27245952K), 0.0428200 secs] [Times: user=0.28
>>>>> sys=0.00,
>>>>> real=0.05 secs]
>>>>> Thu Feb 4 10:13:35 2010: [CMS-concurrent-mark: 36.272/36.715 secs]
>>>>> [Times:
>>>>> user=56.34 sys=0.34, real=36.71 secs]
>>>>> ....
>>>>>
>>>>>
>>>>> ....
>>>>> Thu Feb 4 11:28:59 2010: [GC [1 CMS-initial-mark:
>>>>> 19012252K(27092608K)]
>>>>> 19032719K(27245952K), 0.0279440 secs] [Times: user=0.02 sys=0.00,
>>>>> real=0.03
>>>>> secs]
>>>>> Thu Feb 4 11:28:59 2010: [CMS-concurrent-mark-start]
>>>>> Thu Feb 4 11:28:59 2010 [19,154,661 K]: [GC 46595.614: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>> 17431976
>>>>> bytes, 17431976 total : 153344K->17024K(153344K), 0.9259390 secs]
>>>>> 19165596K->19154661K(27245952K), 0.9260890 secs] [Times: user=1.23
>>>>> sys=0.05,
>>>>> real=0.93 secs]
>>>>> Thu Feb 4 11:29:00 2010 [19,199,696 K]: [GC 46596.973: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>> 17448432
>>>>> bytes, 17448432 total : 153344K->17024K(153344K), 0.3054460 secs]
>>>>> 19290981K->19199696K(27245952K), 0.3055930 secs] [Times: user=0.79
>>>>> sys=0.07,
>>>>> real=0.31 secs]
>>>>> Thu Feb 4 11:29:01 2010 [19,229,358 K]: [GC 46597.722: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>> 10331408
>>>>> bytes, 10331408 total : 153344K->17024K(153344K), 0.1449040 secs]
>>>>> 19336016K->19229358K(27245952K), 0.1450630 secs] [Times: user=0.59
>>>>> sys=0.02,
>>>>> real=0.15 secs]
>>>>> Thu Feb 4 11:29:02 2010 [19,240,015 K]: [GC 46598.038: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>> 17354120
>>>>> bytes, 17354120 total : 153344K->17024K(153344K), 0.0935440 secs]
>>>>> 19365678K->19240015K(27245952K), 0.0936870 secs] [Times: user=0.59
>>>>> sys=0.00,
>>>>> real=0.10 secs]
>>>>> Thu Feb 4 11:29:02 2010 [19,302,509 K]: [GC 46598.347: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>> 17432360
>>>>> bytes, 17432360 total : 151519K->17024K(153344K), 0.4619430 secs]
>>>>> 19374510K->19302509K(27245952K), 0.4620890 secs] [Times: user=1.19
>>>>> sys=0.02,
>>>>> real=0.46 secs]
>>>>> Thu Feb 4 11:29:03 2010 [19,329,594 K]: [GC 46599.030: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>> 11933296
>>>>> bytes, 11933296 total : 153344K->17024K(153344K), 0.1492670 secs]
>>>>> 19438829K->19329594K(27245952K), 0.1494200 secs] [Times: user=0.77
>>>>> sys=0.03,
>>>>> real=0.15 secs]
>>>>> Thu Feb 4 11:29:03 2010 [19,337,176 K]: [GC 46599.423: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>> 5719032
>>>>> bytes, 5719032 total : 153344K->12852K(153344K), 0.0675180 secs]
>>>>> 19465914K->19337176K(27245952K), 0.0676670 secs] [Times: user=0.46
>>>>> sys=0.02,
>>>>> real=0.07 secs]
>>>>> Thu Feb 4 11:29:03 2010 [19,338,256 K]: [GC 46599.754: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 2 (max 4) - age 1:
>>>>> 6783616
>>>>> bytes, 6783616 total - age 2: 5570080 bytes, 12353696 total :
>>>>> 149172K->13931K(153344K), 0.0650520 secs]
>>>>> 19473496K->19338256K(27245952K),
>>>>> 0.0652060 secs] [Times: user=0.42 sys=0.01, real=0.07 secs]
>>>>> Thu Feb 4 11:29:04 2010 [19,340,264 K]: [GC 46600.144: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 2 (max 4) - age 1:
>>>>> 2190136
>>>>> bytes, 2190136 total - age 2: 6762648 bytes, 8952784 total :
>>>>> 150251K->10375K(153344K), 0.0709470 secs]
>>>>> 19474576K->19340264K(27245952K),
>>>>> 0.0710920 secs] [Times: user=0.48 sys=0.01, real=0.07 secs]
>>>>> Thu Feb 4 11:29:17 2010 [19,345,928 K]: [GC 46613.628: [ParNew Desired
>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>> 760488
>>>>> bytes, 760488 total - age 2: 2173472 bytes, 2933960 total :
>>>>> 146695K->9279K(153344K), 0.0689390 secs]
>>>>> 19476584K->19345928K(27245952K),
>>>>> 0.0691070 secs] [Times: user=0.45 sys=0.00, real=0.07 secs]
>>>>> Thu Feb 4 11:29:17 2010: [CMS-concurrent-mark: 15.853/18.260 secs]
>>>>> [Times:
>>>>> user=29.26 sys=0.55, real=18.26 secs]
>>>>> Thu Feb 4 11:29:17 2010: [CMS-concurrent-preclean-start]
>>>>> ....
>>>>>
>>>>> Any idea what may causing these pauses and what possible JVM tuning
>>>>> would
>>>>> be
>>>>> neccessary to avoid these pauses? Your help greatly appreciated.
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>> Atul
>>> http://old.nabble.com/file/p27463304/long.pause.in.cms.gc.log
>>> long.pause.in.cms.gc.log
>>
>
More information about the hotspot-gc-dev
mailing list