Why abortable-preclean phase is not being aborted after YG occupancy exceeds 50%?
Y. S. Ramakrishna
y.s.ramakrishna at oracle.com
Wed Apr 6 23:53:01 UTC 2011
Hi Bartek --
On 04/01/11 10:40, Bartek Markocki wrote:
> Hi Ramki,
>
> On Fri, Apr 1, 2011 at 6:30 PM, Y. Srinivas Ramakrishna
> <y.s.ramakrishna at oracle.com> wrote:
>> Try -XX:+CMSSCavengeBeforeRemark as a temporary workaround
>> for this, and let us know if the performance is reasonable
>> or not.
> We will try to push the +CMSScavengeBeforeRemark to our production but
> as we are talking about the production environment it might take some
> time to return to you with the results.
>
>> I'll look at your log (can you send me your whole GC log,
>> showing the problem, off-list?).
> Just did.
>
>> I think there's probably an open CR for this, which i'll
>> dig up for you.
> Thanks a lot!
The CR I had in mind is this one:-
6990419 CMS: Remaining work for 6572569: consistently skewed work distribution in (long) re-mark pauses
It's an RFE, and I added you to the "Service Request".
If you have a support contract with Oracle, please
send the SR# to your support engineer, so he can
do the needful.
I looked at yr logs and it seems very much like
the problem I mention in this CR, although data
from -XX:PrintCMSStatistics=2 would help ascertain
if that was the issue. (Note to self, make
PrintCMSStatistics a manageable flag so it can
be turned on in a live JVM rather than to restart
a fresh run; ditto for CMSScavengeBeforeRemark:
i'll file RFE's for those, although not sure
when we can get them done.)
An alternative workaround that might also work
for you would be -XX:CMSWaitDuration=X
where X = at least two times the maximum interscavenge
duration observed by yr application. (The RFE
is to, among other things, ergonomify that setting.)
-- ramki
>
> Bartek
>
>
>> On 4/1/2011 2:16 AM, Bartek Markocki wrote:
>>> Hi all,
>>>
>>> Can I ask any of you to review the attached extracts from our
>>> production GC log and share your thoughts about them?
>>>
>>> We have a router-type web application running under tomcat 6.0.28 with
>>> Java 1.6.0_21 (64bit) on RHEL 5.2 (2.6.18-92.1.22.el5). The GC
>>> settings are:
>>> -Xmx2048m -Xms2048m -XX:NewSize=1024m
>>> -XX:PermSize=64m -XX:MaxPermSize=128m
>>> -XX:ThreadStackSize=128
>>> -XX:+DisableExplicitGC
>>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>> -XX:+PrintGCDetails
>>>
>>> What we did:
>>>
>>> Lately we have changed from ParallelOld to CMS due to unacceptable
>>> long Full GC pauses times. In preparation to the change of the
>>> collector we performed a lot of GC tuning related tests and found out
>>> that the above (simple) set of settings fulfill our needs in the best
>>> way.
>>> So far we are happy with what we see (frequency of minor scans/CMS
>>> cycles, times of STW pauses) with one exception.
>>>
>>>
>>> What is the problem:
>>>
>>> Some of our remark phases last much longer than others (up to 8 times
>>> on avg.). Normal remark phase lasts between 55 and 90ms, the longest
>>> one lasted for 538ms.
>>> At first we thought that this is due to aborting the preceding
>>> abortable-preclean phase. After a closer look we found out that
>>> depending on the volume of traffic (i.e., time of day) in fact some of
>>> our abortable-preclean phases are aborted due to time limit (5sec).
>>> Despite that most of the following remark phases times still are
>>> within acceptable limit (up to 100ms). So we kept digging. As a result
>>> of that we found out that the abnormal long remark phases are preceded
>>> by aborted abortable-preclean phase. The phase was always aborted due
>>> to the time limit however if we have a look at the following report
>>> for the young generation occupancy in all cases we were able to find
>>> that YG was occupied in far more than 50%.
>>> Per my (current :)) understanding the abortable-preclean phase can be
>>> aborted due of the time limit or because YG got full in about 50% (so
>>> remark phase will happen midway during two minor collections) -
>>> whatever comes first. In our case the 'about 50%' condition is not
>>> executed and the phase continues until it hits the time limit. The
>>> following remark phase always last longer, i.e., 350-550ms.
>>>
>>>
>>> The big question:
>>>
>>> What can we do to cut down the time of those long lasting remark phases?
>>>
>>>
>>> Below I enclose three samples from our GC log presenting:
>>> first one - a CMS cycle that aborted the abortable-preclean phase due
>>> time limit and the following remark phase does not show the abnormal
>>> behavior.
>>> second one - an "ideal" CMS cycle
>>> third one - a CMS cycle with aborted the abortable-preclean phase (due
>>> to time limit even though YG occupancy is much greater than 50%) and
>>> the following remark phase lasts for 0.5second.
>>>
>>> --
>>> 1142110.458: [GC 1142110.458: [ParNew: 888646K->45370K(943744K),
>>> 0.0728880 secs] 1852227K->1013124K(1992320K), 0.0739250 secs] [Times:
>>> user=0.33 sys=0.01, real=0.07 secs]
>>> 1142110.547: [GC [1 CMS-initial-mark: 967753K(1048576K)]
>>> 1013331K(1992320K), 0.0540170 secs] [Times: user=0.06 sys=0.00,
>>> real=0.05 secs]
>>> 1142110.602: [CMS-concurrent-mark-start]
>>> 1142111.010: [CMS-concurrent-mark: 0.408/0.408 secs] [Times: user=1.96
>>> sys=0.07, real=0.41 secs]
>>> 1142111.011: [CMS-concurrent-preclean-start]
>>> 1142111.028: [CMS-concurrent-preclean: 0.016/0.017 secs] [Times:
>>> user=0.02 sys=0.00, real=0.02 secs]
>>> 1142111.028: [CMS-concurrent-abortable-preclean-start]
>>> CMS: abort preclean due to time 1142116.036:
>>> [CMS-concurrent-abortable-preclean: 4.858/5.007 secs] [Times:
>>> user=7.31 sys=0.57, real=5.00 secs]
>>> 1142116.050: [GC[YG occupancy: 409639 K (943744 K)]1142116.051:
>>> [Rescan (parallel) , 0.0389910 secs]1142116.090: [weak refs
>>> processing, 0.0156130 secs] [1 CMS-remark: 967753K(1048576K)]
>>> 1377393K(1992320K), 0.0554700 secs] [Times: user=0.50 sys=0.00,
>>> real=0.06 secs]
>>> 1142116.107: [CMS-concurrent-sweep-start]
>>> 1142117.721: [CMS-concurrent-sweep: 1.614/1.614 secs] [Times:
>>> user=2.41 sys=0.24, real=1.61 secs]
>>> 1142117.721: [CMS-concurrent-reset-start]
>>> 1142117.732: [CMS-concurrent-reset: 0.010/0.010 secs] [Times:
>>> user=0.01 sys=0.00, real=0.01 secs]
>>> 1142121.278: [GC 1142121.279: [ParNew: 884282K->52652K(943744K),
>>> 0.0680850 secs] 1200273K->372087K(1992320K), 0.0690040 secs] [Times:
>>> user=0.29 sys=0.01, real=0.07 secs]
>>> 1142133.508: [GC 1142133.508: [ParNew: 891564K->47435K(943744K),
>>> 0.0682080 secs] 1210999K->370280K(1992320K), 0.0691030 secs] [Times:
>>> user=0.29 sys=0.01, real=0.07 secs]
>>> --
>>> 1165584.305: [GC 1165584.305: [ParNew: 896212K->59055K(943744K),
>>> 0.0761290 secs] 1857148K->1023947K(1992320K), 0.0771330 secs] [Times:
>>> user=0.33 sys=0.00, real=0.08 secs]
>>> 1165584.398: [GC [1 CMS-initial-mark: 964891K(1048576K)]
>>> 1024053K(1992320K), 0.0631010 secs] [Times: user=0.06 sys=0.00,
>>> real=0.06 secs]
>>> 1165584.463: [CMS-concurrent-mark-start]
>>> 1165584.933: [CMS-concurrent-mark: 0.423/0.471 secs] [Times: user=2.40
>>> sys=0.21, real=0.47 secs]
>>> 1165584.934: [CMS-concurrent-preclean-start]
>>> 1165584.954: [CMS-concurrent-preclean: 0.018/0.021 secs] [Times:
>>> user=0.05 sys=0.00, real=0.02 secs]
>>> 1165584.955: [CMS-concurrent-abortable-preclean-start]
>>> 1165587.876: [CMS-concurrent-abortable-preclean: 2.884/2.921 secs]
>>> [Times: user=5.51 sys=0.65, real=2.92 secs]
>>> 1165587.892: [GC[YG occupancy: 479051 K (943744 K)]1165587.892:
>>> [Rescan (parallel) , 0.0746810 secs]1165587.967: [weak refs
>>> processing, 0.0168870 secs] [1 CMS-remark: 964891K(1048576K)]
>>> 1443943K(1992320K), 0.0925600 secs] [Times: user=0.91 sys=0.01,
>>> real=0.09 secs]
>>> 1165587.986: [CMS-concurrent-sweep-start]
>>> 1165589.670: [CMS-concurrent-sweep: 1.684/1.684 secs] [Times:
>>> user=3.39 sys=0.46, real=1.69 secs]
>>> 1165589.671: [CMS-concurrent-reset-start]
>>> 1165589.679: [CMS-concurrent-reset: 0.009/0.009 secs] [Times:
>>> user=0.01 sys=0.00, real=0.01 secs]
>>> 1165591.354: [GC 1165591.354: [ParNew: 897967K->54984K(943744K),
>>> 0.0862910 secs] 1236513K->397404K(1992320K), 0.0872930 secs] [Times:
>>> user=0.34 sys=0.00, real=0.09 secs]
>>> 1165598.887: [GC 1165598.888: [ParNew: 893896K->52086K(943744K),
>>> 0.0885510 secs] 1236316K->398587K(1992320K), 0.0895820 secs] [Times:
>>> user=0.31 sys=0.01, real=0.09 secs]
>>> --
>>> 1166753.770: [GC 1166753.770: [ParNew: 899148K->57315K(943744K),
>>> 0.0782510 secs] 1862058K->1024198K(1992320K), 0.0793040 secs] [Times:
>>> user=0.32 sys=0.01, real=0.08 secs]
>>> 1166753.867: [GC [1 CMS-initial-mark: 966883K(1048576K)]
>>> 1024305K(1992320K), 0.0642680 secs] [Times: user=0.07 sys=0.00,
>>> real=0.07 secs]
>>> 1166753.932: [CMS-concurrent-mark-start]
>>> 1166754.471: [CMS-concurrent-mark: 0.486/0.538 secs] [Times: user=2.76
>>> sys=0.28, real=0.54 secs]
>>> 1166754.471: [CMS-concurrent-preclean-start]
>>> 1166754.488: [CMS-concurrent-preclean: 0.015/0.017 secs] [Times:
>>> user=0.04 sys=0.00, real=0.01 secs]
>>> 1166754.488: [CMS-concurrent-abortable-preclean-start]
>>> CMS: abort preclean due to time 1166759.533:
>>> [CMS-concurrent-abortable-preclean: 4.895/5.044 secs] [Times:
>>> user=9.75 sys=1.21, real=5.05 secs]
>>> 1166759.549: [GC[YG occupancy: 791197 K (943744 K)]1166759.549:
>>> [Rescan (parallel) , 0.5387660 secs]1166760.088: [weak refs
>>> processing, 0.0139780 secs] [1 CMS-remark: 966883K(1048576K)]
>>> 1758080K(1992320K), 0.5537750 secs] [Times: user=5.58 sys=0.06,
>>> real=0.56 secs]
>>> 1166760.105: [CMS-concurrent-sweep-start]
>>> 1166760.688: [GC 1166760.689: [ParNew: 896188K->57161K(943744K),
>>> 0.0727850 secs] 1623884K->788963K(1992320K), 0.0737390 secs] [Times:
>>> user=0.31 sys=0.02, real=0.08 secs]
>>> 1166761.593: [CMS-concurrent-sweep: 1.363/1.488 secs] [Times:
>>> user=3.48 sys=0.49, real=1.49 secs]
>>> 1166761.593: [CMS-concurrent-reset-start]
>>> 1166761.602: [CMS-concurrent-reset: 0.009/0.009 secs] [Times:
>>> user=0.02 sys=0.01, real=0.01 secs]
>>> 1166767.947: [GC 1166767.948: [ParNew: 896053K->58188K(943744K),
>>> 0.0817680 secs] 1238926K->404605K(1992320K), 0.0828270 secs] [Times:
>>> user=0.31 sys=0.01, real=0.08 secs]
>>> --
>>>
>>> Thank you in advance,
>>> Bartek
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
More information about the hotspot-gc-dev
mailing list