Why abortable-preclean phase is not being aborted after YG occupancy exceeds 50%?
Bartek Markocki
bartosz.markocki at gmail.com
Fri Apr 1 17:40:26 UTC 2011
Hi Ramki,
On Fri, Apr 1, 2011 at 6:30 PM, Y. Srinivas Ramakrishna
<y.s.ramakrishna at oracle.com> wrote:
> Try -XX:+CMSSCavengeBeforeRemark as a temporary workaround
> for this, and let us know if the performance is reasonable
> or not.
We will try to push the +CMSScavengeBeforeRemark to our production but
as we are talking about the production environment it might take some
time to return to you with the results.
> I'll look at your log (can you send me your whole GC log,
> showing the problem, off-list?).
Just did.
> I think there's probably an open CR for this, which i'll
> dig up for you.
Thanks a lot!
Bartek
> On 4/1/2011 2:16 AM, Bartek Markocki wrote:
>>
>> Hi all,
>>
>> Can I ask any of you to review the attached extracts from our
>> production GC log and share your thoughts about them?
>>
>> We have a router-type web application running under tomcat 6.0.28 with
>> Java 1.6.0_21 (64bit) on RHEL 5.2 (2.6.18-92.1.22.el5). The GC
>> settings are:
>> -Xmx2048m -Xms2048m -XX:NewSize=1024m
>> -XX:PermSize=64m -XX:MaxPermSize=128m
>> -XX:ThreadStackSize=128
>> -XX:+DisableExplicitGC
>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>> -XX:+PrintGCDetails
>>
>> What we did:
>>
>> Lately we have changed from ParallelOld to CMS due to unacceptable
>> long Full GC pauses times. In preparation to the change of the
>> collector we performed a lot of GC tuning related tests and found out
>> that the above (simple) set of settings fulfill our needs in the best
>> way.
>> So far we are happy with what we see (frequency of minor scans/CMS
>> cycles, times of STW pauses) with one exception.
>>
>>
>> What is the problem:
>>
>> Some of our remark phases last much longer than others (up to 8 times
>> on avg.). Normal remark phase lasts between 55 and 90ms, the longest
>> one lasted for 538ms.
>> At first we thought that this is due to aborting the preceding
>> abortable-preclean phase. After a closer look we found out that
>> depending on the volume of traffic (i.e., time of day) in fact some of
>> our abortable-preclean phases are aborted due to time limit (5sec).
>> Despite that most of the following remark phases times still are
>> within acceptable limit (up to 100ms). So we kept digging. As a result
>> of that we found out that the abnormal long remark phases are preceded
>> by aborted abortable-preclean phase. The phase was always aborted due
>> to the time limit however if we have a look at the following report
>> for the young generation occupancy in all cases we were able to find
>> that YG was occupied in far more than 50%.
>> Per my (current :)) understanding the abortable-preclean phase can be
>> aborted due of the time limit or because YG got full in about 50% (so
>> remark phase will happen midway during two minor collections) -
>> whatever comes first. In our case the 'about 50%' condition is not
>> executed and the phase continues until it hits the time limit. The
>> following remark phase always last longer, i.e., 350-550ms.
>>
>>
>> The big question:
>>
>> What can we do to cut down the time of those long lasting remark phases?
>>
>>
>> Below I enclose three samples from our GC log presenting:
>> first one - a CMS cycle that aborted the abortable-preclean phase due
>> time limit and the following remark phase does not show the abnormal
>> behavior.
>> second one - an "ideal" CMS cycle
>> third one - a CMS cycle with aborted the abortable-preclean phase (due
>> to time limit even though YG occupancy is much greater than 50%) and
>> the following remark phase lasts for 0.5second.
>>
>> --
>> 1142110.458: [GC 1142110.458: [ParNew: 888646K->45370K(943744K),
>> 0.0728880 secs] 1852227K->1013124K(1992320K), 0.0739250 secs] [Times:
>> user=0.33 sys=0.01, real=0.07 secs]
>> 1142110.547: [GC [1 CMS-initial-mark: 967753K(1048576K)]
>> 1013331K(1992320K), 0.0540170 secs] [Times: user=0.06 sys=0.00,
>> real=0.05 secs]
>> 1142110.602: [CMS-concurrent-mark-start]
>> 1142111.010: [CMS-concurrent-mark: 0.408/0.408 secs] [Times: user=1.96
>> sys=0.07, real=0.41 secs]
>> 1142111.011: [CMS-concurrent-preclean-start]
>> 1142111.028: [CMS-concurrent-preclean: 0.016/0.017 secs] [Times:
>> user=0.02 sys=0.00, real=0.02 secs]
>> 1142111.028: [CMS-concurrent-abortable-preclean-start]
>> CMS: abort preclean due to time 1142116.036:
>> [CMS-concurrent-abortable-preclean: 4.858/5.007 secs] [Times:
>> user=7.31 sys=0.57, real=5.00 secs]
>> 1142116.050: [GC[YG occupancy: 409639 K (943744 K)]1142116.051:
>> [Rescan (parallel) , 0.0389910 secs]1142116.090: [weak refs
>> processing, 0.0156130 secs] [1 CMS-remark: 967753K(1048576K)]
>> 1377393K(1992320K), 0.0554700 secs] [Times: user=0.50 sys=0.00,
>> real=0.06 secs]
>> 1142116.107: [CMS-concurrent-sweep-start]
>> 1142117.721: [CMS-concurrent-sweep: 1.614/1.614 secs] [Times:
>> user=2.41 sys=0.24, real=1.61 secs]
>> 1142117.721: [CMS-concurrent-reset-start]
>> 1142117.732: [CMS-concurrent-reset: 0.010/0.010 secs] [Times:
>> user=0.01 sys=0.00, real=0.01 secs]
>> 1142121.278: [GC 1142121.279: [ParNew: 884282K->52652K(943744K),
>> 0.0680850 secs] 1200273K->372087K(1992320K), 0.0690040 secs] [Times:
>> user=0.29 sys=0.01, real=0.07 secs]
>> 1142133.508: [GC 1142133.508: [ParNew: 891564K->47435K(943744K),
>> 0.0682080 secs] 1210999K->370280K(1992320K), 0.0691030 secs] [Times:
>> user=0.29 sys=0.01, real=0.07 secs]
>> --
>> 1165584.305: [GC 1165584.305: [ParNew: 896212K->59055K(943744K),
>> 0.0761290 secs] 1857148K->1023947K(1992320K), 0.0771330 secs] [Times:
>> user=0.33 sys=0.00, real=0.08 secs]
>> 1165584.398: [GC [1 CMS-initial-mark: 964891K(1048576K)]
>> 1024053K(1992320K), 0.0631010 secs] [Times: user=0.06 sys=0.00,
>> real=0.06 secs]
>> 1165584.463: [CMS-concurrent-mark-start]
>> 1165584.933: [CMS-concurrent-mark: 0.423/0.471 secs] [Times: user=2.40
>> sys=0.21, real=0.47 secs]
>> 1165584.934: [CMS-concurrent-preclean-start]
>> 1165584.954: [CMS-concurrent-preclean: 0.018/0.021 secs] [Times:
>> user=0.05 sys=0.00, real=0.02 secs]
>> 1165584.955: [CMS-concurrent-abortable-preclean-start]
>> 1165587.876: [CMS-concurrent-abortable-preclean: 2.884/2.921 secs]
>> [Times: user=5.51 sys=0.65, real=2.92 secs]
>> 1165587.892: [GC[YG occupancy: 479051 K (943744 K)]1165587.892:
>> [Rescan (parallel) , 0.0746810 secs]1165587.967: [weak refs
>> processing, 0.0168870 secs] [1 CMS-remark: 964891K(1048576K)]
>> 1443943K(1992320K), 0.0925600 secs] [Times: user=0.91 sys=0.01,
>> real=0.09 secs]
>> 1165587.986: [CMS-concurrent-sweep-start]
>> 1165589.670: [CMS-concurrent-sweep: 1.684/1.684 secs] [Times:
>> user=3.39 sys=0.46, real=1.69 secs]
>> 1165589.671: [CMS-concurrent-reset-start]
>> 1165589.679: [CMS-concurrent-reset: 0.009/0.009 secs] [Times:
>> user=0.01 sys=0.00, real=0.01 secs]
>> 1165591.354: [GC 1165591.354: [ParNew: 897967K->54984K(943744K),
>> 0.0862910 secs] 1236513K->397404K(1992320K), 0.0872930 secs] [Times:
>> user=0.34 sys=0.00, real=0.09 secs]
>> 1165598.887: [GC 1165598.888: [ParNew: 893896K->52086K(943744K),
>> 0.0885510 secs] 1236316K->398587K(1992320K), 0.0895820 secs] [Times:
>> user=0.31 sys=0.01, real=0.09 secs]
>> --
>> 1166753.770: [GC 1166753.770: [ParNew: 899148K->57315K(943744K),
>> 0.0782510 secs] 1862058K->1024198K(1992320K), 0.0793040 secs] [Times:
>> user=0.32 sys=0.01, real=0.08 secs]
>> 1166753.867: [GC [1 CMS-initial-mark: 966883K(1048576K)]
>> 1024305K(1992320K), 0.0642680 secs] [Times: user=0.07 sys=0.00,
>> real=0.07 secs]
>> 1166753.932: [CMS-concurrent-mark-start]
>> 1166754.471: [CMS-concurrent-mark: 0.486/0.538 secs] [Times: user=2.76
>> sys=0.28, real=0.54 secs]
>> 1166754.471: [CMS-concurrent-preclean-start]
>> 1166754.488: [CMS-concurrent-preclean: 0.015/0.017 secs] [Times:
>> user=0.04 sys=0.00, real=0.01 secs]
>> 1166754.488: [CMS-concurrent-abortable-preclean-start]
>> CMS: abort preclean due to time 1166759.533:
>> [CMS-concurrent-abortable-preclean: 4.895/5.044 secs] [Times:
>> user=9.75 sys=1.21, real=5.05 secs]
>> 1166759.549: [GC[YG occupancy: 791197 K (943744 K)]1166759.549:
>> [Rescan (parallel) , 0.5387660 secs]1166760.088: [weak refs
>> processing, 0.0139780 secs] [1 CMS-remark: 966883K(1048576K)]
>> 1758080K(1992320K), 0.5537750 secs] [Times: user=5.58 sys=0.06,
>> real=0.56 secs]
>> 1166760.105: [CMS-concurrent-sweep-start]
>> 1166760.688: [GC 1166760.689: [ParNew: 896188K->57161K(943744K),
>> 0.0727850 secs] 1623884K->788963K(1992320K), 0.0737390 secs] [Times:
>> user=0.31 sys=0.02, real=0.08 secs]
>> 1166761.593: [CMS-concurrent-sweep: 1.363/1.488 secs] [Times:
>> user=3.48 sys=0.49, real=1.49 secs]
>> 1166761.593: [CMS-concurrent-reset-start]
>> 1166761.602: [CMS-concurrent-reset: 0.009/0.009 secs] [Times:
>> user=0.02 sys=0.01, real=0.01 secs]
>> 1166767.947: [GC 1166767.948: [ParNew: 896053K->58188K(943744K),
>> 0.0817680 secs] 1238926K->404605K(1992320K), 0.0828270 secs] [Times:
>> user=0.31 sys=0.01, real=0.08 secs]
>> --
>>
>> Thank you in advance,
>> Bartek
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
More information about the hotspot-gc-dev
mailing list