long unaccounted pause during concurrent mark phase of ParNew cycle
atulksh
atulksh at hotmail.com
Fri Feb 5 17:10:08 UTC 2010
I see. Sorry for not getting your previous question. Actually before Update
02 we did not see this issue. But that was a very different situation
altogether since our servers did not have that much memory allocated to
them. Only recently we have seen this issue come to surface because perhaps
it has now become very noticeable.
One correlation that we have found is that the pauses occur when there is
significant load on the system (in terms of user activity - not necessarily
user hits). When the system is relatively less uses the pauses are also
relatively less.
Thank you,
Atul
Jon Masamitsu wrote:
>
>
>
> On 02/05/10 08:47, atulksh wrote:
>> Hi Jon,
>> We are currently still using Update 2 on our servers. We see the same
>> kind
>> of pause behavior on Update 2 as well as Update 17. The pause behavior
>> does
>> not seem to change between these two versions.
>
> Did you notice this behavior after upgrading to jdk6 update 2 (i.e.,
> did it work OK on a jdk5 release or earlier)?
>
>>
>> Thank you,
>>
>> Atul
>>
>>
>> Jon Masamitsu wrote:
>>> Atul,
>>>
>>> You mentioned that you were on jdk6 update 2 initially.
>>> Did you see this behavior with earlier releases?
>>>
>>> Jon
>>>
>>> On 02/04/10 20:02, atulksh wrote:
>>>> Hi Jon,
>>>> I have some more (probably more precise) information for you. I have
>>>> been
>>>> able to reproduce this issue in our test environment (up till now we
>>>> were
>>>> only seeing it in our production setup). Along with the parameters that
>>>> I
>>>> have specified above I added the following.
>>>>
>>>>
>>>>
>>>> After loading up the app server (it is Tomcat 6.0.14) I got a pause
>>>> during
>>>> the CMS cycle of about 32 seconds. The way I noticed the pause is like
>>>> this.
>>>> I kept on running a script that does some basic page refresh (small
>>>> data
>>>> page) in the application continously. This is like a ping to the
>>>> server.
>>>> Then I parallely do operations that load up the memory in the
>>>> application
>>>> server. These are the real hits that users do in whom memory
>>>> requirement
>>>> is
>>>> usually high and that drives the app server to the CMS threshold.
>>>>
>>>> In my case once the CMS kicked in, I saw that the period where the
>>>> applilcation was in the "pause" no requests of my script were
>>>> processed.
>>>> They were just queued up to Tomcat - meaning Apache got those requests
>>>> and
>>>> logged that they were sent to Tomcat. But Tomcat did not register of
>>>> receiving them. Then after the app went out of the "pause" then
>>>> everything
>>>> came back and all these pending requests got processed - like the app
>>>> just
>>>> had been unfrozen.
>>>>
>>>> During this freeze, what I see in the GC log is that the CMS cycle is
>>>> almost
>>>> at the end of the concurrent mark phase. It is doing the GC is doing
>>>> the
>>>> "perm space scanning". The pause exactly coincides with the time
>>>> reported
>>>> for the "perm space scanning". I am attaching the full gc log snippt
>>>> right
>>>> from the start of the CMS cycle. Below is just the place where the
>>>> pause
>>>> appears.
>>>>
>>>> 28676.784: [GC 28676.784: [ParNew
>>>> Desired survivor size 4358144 bytes, new threshold 2 (max 4)
>>>> - age 1: 3742608 bytes, 3742608 total
>>>> - age 2: 3451152 bytes, 7193760 total
>>>> : 76672K->8512K(76672K), 0.2045430 secs]
>>>> 20262376K->20197311K(27254464K),
>>>> 0.2048030 secs] [Times: user=0.77 sys=0.01, real=0.21 secs]
>>>> Total time for which application threads were stopped: 0.2055080
>>>> seconds
>>>> Finished perm space scanning in 2th thread: 0.367 sec
>>>> Finished perm space scanning in 0th thread: 0.362 sec
>>>> Application time: 0.1481860 seconds
>>>> 28708.626: [GC 28708.627: [ParNew
>>>> Desired survivor size 4358144 bytes, new threshold 2 (max 4)
>>>> - age 1: 2079760 bytes, 2079760 total
>>>> - age 2: 3657248 bytes, 5737008 total
>>>> : 76672K->6795K(76672K), 0.1937080 secs]
>>>> 20265471K->20198965K(27254464K),
>>>> 0.1939480 secs] [Times: user=0.74 sys=0.00, real=0.20 secs]
>>>> Total time for which application threads were stopped: 0.1948820
>>>> seconds
>>>> Finished perm space scanning in 3th thread: 32.153 sec
>>>> Finished perm space scanning in 1th thread: 32.097 sec
>>>> Finished work stealing in 1th thread: 0.000 sec
>>>> Finished work stealing in 3th thread: 0.002 sec
>>>> Finished work stealing in 2th thread: 31.791 sec
>>>> Finished work stealing in 0th thread: 31.791 sec
>>>> 28708.825: [CMS-concurrent-mark: 88.906/158.952 secs]
>>>> (CMS-concurrent-mark
>>>> yielded 454 times)
>>>> [Times: user=544.33 sys=4.59, real=158.95 secs]
>>>>
>>>> What is "perm space scanning" and is this where the gc would freeze the
>>>> application (STW type). Is this a bug or something that is expected?
>>>>
>>>> Please let me know. This has become quite critical for our application.
>>>> Please see the full gc log file attached. Your help very much
>>>> appreciated.
>>>>
>>>> Thank you,
>>>>
>>>> Atul
>>>>
>>>>
>>>> Jon Masamitsu wrote:
>>>>> Atul,
>>>>>
>>>>> Do you mean the 36 secs here?
>>>>>
>>>>> > Thu Feb 4 10:13:35 2010: [CMS-concurrent-mark: 36.272/36.715 secs]
>>>>> [Times:
>>>>> > user=56.34 sys=0.34, real=36.71 secs]
>>>>>
>>>>> That is a concurrent phase (i.e., the application threads
>>>>> are still running although they have to share the hardware
>>>>> threads with the concurrent GC threads).
>>>>>
>>>>> Jon
>>>>>
>>>>> On 02/04/10 12:37, atulksh wrote:
>>>>>> Hi,
>>>>>> We have an application that is freezing (pausing) for a very long
>>>>>> (tens
>>>>>> of
>>>>>> seconds) time during the concurrent mark phase of the ParNew cycle.
>>>>>> This
>>>>>> pause is consistently seen when there is good amount of activity on
>>>>>> the
>>>>>> server. We tried to upgrade the JVM (JRE 6) from Update 02 to Update
>>>>>> 17
>>>>>> keeping all the parameters same, but the pauses did not go away (they
>>>>>> became
>>>>>> more consistent and slightly longer). We even tried to enable
>>>>>> CompressedOops
>>>>>> option, thinking that that would reduce memory requirements, but that
>>>>>> has
>>>>>> not helped.
>>>>>>
>>>>>> We are running on 64bit SUSE 10.1 platform with 32Gb of RAM and 8
>>>>>> core
>>>>>> machine. 26g is allocated to the java process. The full set of VM
>>>>>> parameters
>>>>>> are here
>>>>>>
>>>>>>
>>>>>> -Xmx26g
>>>>>> -Xms26g
>>>>>> -server
>>>>>> -XX:+UseConcMarkSweepGC
>>>>>> -XX:PermSize=256m
>>>>>> -XX:MaxPermSize=256m
>>>>>> -XX:+PrintTenuringDistribution
>>>>>> -XX:+PrintGCDetails
>>>>>> -XX:+PrintGCTimeStamps
>>>>>> -XX:CMSInitiatingOccupancyFraction=70
>>>>>> -XX:-CMSParallelRemarkEnabled
>>>>>>
>>>>>> The nature of the pause is like this. All the application threads
>>>>>> that
>>>>>> are
>>>>>> being processed seem to stop responding. New requests that come to
>>>>>> our
>>>>>> app
>>>>>> server are not processed and get queued. When the pause ends then
>>>>>> everything
>>>>>> frees up and seems to go on normally henceforth. In the GC you see a
>>>>>> gap
>>>>>> between the ParNews. The following are two examples of the pauses.
>>>>>> Notice
>>>>>> the pause from 10:13:06 to 10:13:35 (29 seconds) in the first one and
>>>>>> 11:29:04 to 11:29:17 (13 seconds) in the second one. These pauses
>>>>>> only
>>>>>> occur
>>>>>> in the concurrent mark phase. They do not seem to occur at any other
>>>>>> time
>>>>>> or
>>>>>> anywhere else in the CMS cycle.
>>>>>>
>>>>>> ....
>>>>>> Thu Feb 4 10:12:59 2010: [GC [1 CMS-initial-mark:
>>>>>> 18976170K(27092608K)]
>>>>>> 18989786K(27245952K), 0.0059140 secs] [Times: user=0.01 sys=0.00,
>>>>>> real=0.01
>>>>>> secs]
>>>>>> Thu Feb 4 10:12:59 2010: [CMS-concurrent-mark-start]
>>>>>> Thu Feb 4 10:12:59 2010 [18,980,009 K]: [GC 42035.293: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>>> 1833064
>>>>>> bytes, 1833064 total - age 2: 18472 bytes, 1851536 total
>>>>>> :
>>>>>> 148315K->3839K(153344K), 0.0400230 secs]
>>>>>> 19124486K->18980009K(27245952K),
>>>>>> 0.0401580 secs] [Times: user=0.25 sys=0.00, real=0.04 secs]
>>>>>> Thu Feb 4 10:12:59 2010 [18,979,036 K]: [GC 42035.415: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>>> 2068376
>>>>>> bytes, 2068376 total - age 2: 23792 bytes, 2092168 total
>>>>>> -
>>>>>> age
>>>>>> 3: 18472 bytes, 2110640 total : 140159K->2866K(153344K),
>>>>>> 0.0352200
>>>>>> secs] 19116329K->18979036K(27245952K), 0.0353520 secs] [Times:
>>>>>> user=0.29
>>>>>> sys=0.00, real=0.03 secs]
>>>>>> Thu Feb 4 10:12:59 2010 [18,979,103 K]: [GC 42035.535: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>>> 1782608
>>>>>> bytes, 1782608 total - age 2: 22032 bytes, 1804640 total
>>>>>> -
>>>>>> age
>>>>>> 3: 23792 bytes, 1828432 total - age 4: 18472 bytes,
>>>>>> 1846904 total : 139186K->2933K(153344K), 0.0498480 secs]
>>>>>> 19115356K->18979103K(27245952K), 0.0499850 secs] [Times: user=0.24
>>>>>> sys=0.00,
>>>>>> real=0.05 secs]
>>>>>> Thu Feb 4 10:12:59 2010 [18,979,181 K]: [GC 42035.673: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>>> 2161000
>>>>>> bytes, 2161000 total - age 2: 22032 bytes, 2183032 total
>>>>>> -
>>>>>> age
>>>>>> 3: 22032 bytes, 2205064 total - age 4: 23792 bytes,
>>>>>> 2228856 total : 139253K->2992K(153344K), 0.0400420 secs]
>>>>>> 19115423K->18979181K(27245952K), 0.0401730 secs] [Times: user=0.26
>>>>>> sys=0.00,
>>>>>> real=0.04 secs]
>>>>>> Thu Feb 4 10:13:06 2010 [18,979,876 K]: [GC 42042.243: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>>> 2113272
>>>>>> bytes, 2113272 total - age 2: 23792 bytes, 2137064 total
>>>>>> -
>>>>>> age
>>>>>> 3: 22032 bytes, 2159096 total - age 4: 22032 bytes,
>>>>>> 2181128 total : 138855K->3662K(153344K), 0.0465500 secs]
>>>>>> 19115044K->18979876K(27245952K), 0.0466840 secs] [Times: user=0.27
>>>>>> sys=0.00,
>>>>>> real=0.04 secs]
>>>>>> Thu Feb 4 10:13:35 2010 [18,979,051 K]: [GC 42071.809: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>>> 219056
>>>>>> bytes, 219056 total - age 2: 1867632 bytes, 2086688 total
>>>>>> -
>>>>>> age
>>>>>> 3: 23792 bytes, 2110480 total - age 4: 22032 bytes,
>>>>>> 2132512 total : 139982K->2814K(153344K), 0.0426920 secs]
>>>>>> 19116196K->18979051K(27245952K), 0.0428200 secs] [Times: user=0.28
>>>>>> sys=0.00,
>>>>>> real=0.05 secs]
>>>>>> Thu Feb 4 10:13:35 2010: [CMS-concurrent-mark: 36.272/36.715 secs]
>>>>>> [Times:
>>>>>> user=56.34 sys=0.34, real=36.71 secs]
>>>>>> ....
>>>>>>
>>>>>>
>>>>>> ....
>>>>>> Thu Feb 4 11:28:59 2010: [GC [1 CMS-initial-mark:
>>>>>> 19012252K(27092608K)]
>>>>>> 19032719K(27245952K), 0.0279440 secs] [Times: user=0.02 sys=0.00,
>>>>>> real=0.03
>>>>>> secs]
>>>>>> Thu Feb 4 11:28:59 2010: [CMS-concurrent-mark-start]
>>>>>> Thu Feb 4 11:28:59 2010 [19,154,661 K]: [GC 46595.614: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>>> 17431976
>>>>>> bytes, 17431976 total : 153344K->17024K(153344K), 0.9259390 secs]
>>>>>> 19165596K->19154661K(27245952K), 0.9260890 secs] [Times: user=1.23
>>>>>> sys=0.05,
>>>>>> real=0.93 secs]
>>>>>> Thu Feb 4 11:29:00 2010 [19,199,696 K]: [GC 46596.973: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>>> 17448432
>>>>>> bytes, 17448432 total : 153344K->17024K(153344K), 0.3054460 secs]
>>>>>> 19290981K->19199696K(27245952K), 0.3055930 secs] [Times: user=0.79
>>>>>> sys=0.07,
>>>>>> real=0.31 secs]
>>>>>> Thu Feb 4 11:29:01 2010 [19,229,358 K]: [GC 46597.722: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>>> 10331408
>>>>>> bytes, 10331408 total : 153344K->17024K(153344K), 0.1449040 secs]
>>>>>> 19336016K->19229358K(27245952K), 0.1450630 secs] [Times: user=0.59
>>>>>> sys=0.02,
>>>>>> real=0.15 secs]
>>>>>> Thu Feb 4 11:29:02 2010 [19,240,015 K]: [GC 46598.038: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>>> 17354120
>>>>>> bytes, 17354120 total : 153344K->17024K(153344K), 0.0935440 secs]
>>>>>> 19365678K->19240015K(27245952K), 0.0936870 secs] [Times: user=0.59
>>>>>> sys=0.00,
>>>>>> real=0.10 secs]
>>>>>> Thu Feb 4 11:29:02 2010 [19,302,509 K]: [GC 46598.347: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>>> 17432360
>>>>>> bytes, 17432360 total : 151519K->17024K(153344K), 0.4619430 secs]
>>>>>> 19374510K->19302509K(27245952K), 0.4620890 secs] [Times: user=1.19
>>>>>> sys=0.02,
>>>>>> real=0.46 secs]
>>>>>> Thu Feb 4 11:29:03 2010 [19,329,594 K]: [GC 46599.030: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 1 (max 4) - age 1:
>>>>>> 11933296
>>>>>> bytes, 11933296 total : 153344K->17024K(153344K), 0.1492670 secs]
>>>>>> 19438829K->19329594K(27245952K), 0.1494200 secs] [Times: user=0.77
>>>>>> sys=0.03,
>>>>>> real=0.15 secs]
>>>>>> Thu Feb 4 11:29:03 2010 [19,337,176 K]: [GC 46599.423: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>>> 5719032
>>>>>> bytes, 5719032 total : 153344K->12852K(153344K), 0.0675180 secs]
>>>>>> 19465914K->19337176K(27245952K), 0.0676670 secs] [Times: user=0.46
>>>>>> sys=0.02,
>>>>>> real=0.07 secs]
>>>>>> Thu Feb 4 11:29:03 2010 [19,338,256 K]: [GC 46599.754: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 2 (max 4) - age 1:
>>>>>> 6783616
>>>>>> bytes, 6783616 total - age 2: 5570080 bytes, 12353696 total
>>>>>> :
>>>>>> 149172K->13931K(153344K), 0.0650520 secs]
>>>>>> 19473496K->19338256K(27245952K),
>>>>>> 0.0652060 secs] [Times: user=0.42 sys=0.01, real=0.07 secs]
>>>>>> Thu Feb 4 11:29:04 2010 [19,340,264 K]: [GC 46600.144: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 2 (max 4) - age 1:
>>>>>> 2190136
>>>>>> bytes, 2190136 total - age 2: 6762648 bytes, 8952784 total
>>>>>> :
>>>>>> 150251K->10375K(153344K), 0.0709470 secs]
>>>>>> 19474576K->19340264K(27245952K),
>>>>>> 0.0710920 secs] [Times: user=0.48 sys=0.01, real=0.07 secs]
>>>>>> Thu Feb 4 11:29:17 2010 [19,345,928 K]: [GC 46613.628: [ParNew
>>>>>> Desired
>>>>>> survivor size 8716288 bytes, new threshold 4 (max 4) - age 1:
>>>>>> 760488
>>>>>> bytes, 760488 total - age 2: 2173472 bytes, 2933960 total
>>>>>> :
>>>>>> 146695K->9279K(153344K), 0.0689390 secs]
>>>>>> 19476584K->19345928K(27245952K),
>>>>>> 0.0691070 secs] [Times: user=0.45 sys=0.00, real=0.07 secs]
>>>>>> Thu Feb 4 11:29:17 2010: [CMS-concurrent-mark: 15.853/18.260 secs]
>>>>>> [Times:
>>>>>> user=29.26 sys=0.55, real=18.26 secs]
>>>>>> Thu Feb 4 11:29:17 2010: [CMS-concurrent-preclean-start]
>>>>>> ....
>>>>>>
>>>>>> Any idea what may causing these pauses and what possible JVM tuning
>>>>>> would
>>>>>> be
>>>>>> neccessary to avoid these pauses? Your help greatly appreciated.
>>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>> Atul
>>>> http://old.nabble.com/file/p27463304/long.pause.in.cms.gc.log
>>>> long.pause.in.cms.gc.log
>>>
>>
>
>
--
View this message in context: http://old.nabble.com/long-unaccounted-pause-during-concurrent-mark-phase-of-ParNew-cycle-tp27459365p27470926.html
Sent from the OpenJDK Hotspot Garbage Collection mailing list archive at Nabble.com.
More information about the hotspot-gc-dev
mailing list