Why abortable-preclean phase is not being aborted after YG occupancy exceeds 50%?

Y. Srinivas Ramakrishna y.s.ramakrishna at oracle.com
Fri Apr 1 16:30:19 UTC 2011


Hi Bartek --

Try -XX:+CMSSCavengeBeforeRemark as a temporary workaround
for this, and let us know if the performance is reasonable
or not.

I'll look at your log (can you send me your whole GC log,
showing the problem, off-list?).
I think there's probably an open CR for this, which i'll
dig up for you.

-- ramki

On 4/1/2011 2:16 AM, Bartek Markocki wrote:
> Hi all,
>
> Can I ask any of you to review the attached extracts from our
> production GC log and share your thoughts about them?
>
> We have a router-type web application running under tomcat 6.0.28 with
> Java 1.6.0_21 (64bit) on RHEL 5.2 (2.6.18-92.1.22.el5). The GC
> settings are:
> -Xmx2048m -Xms2048m -XX:NewSize=1024m
> -XX:PermSize=64m -XX:MaxPermSize=128m
> -XX:ThreadStackSize=128
> -XX:+DisableExplicitGC
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> -XX:+PrintGCDetails
>
> What we did:
>
> Lately we have changed from ParallelOld to CMS due to unacceptable
> long Full GC pauses times. In preparation to the change of the
> collector we performed a lot of GC tuning related tests and found out
> that the above (simple) set of settings fulfill our needs in the best
> way.
> So far we are happy with what we see (frequency of minor scans/CMS
> cycles, times of STW pauses) with one exception.
>
>
> What is the problem:
>
> Some of our remark phases last much longer than others (up to 8 times
> on avg.). Normal remark phase lasts between 55 and 90ms, the longest
> one lasted for 538ms.
> At first we thought that this is due to aborting the preceding
> abortable-preclean phase. After a closer look we found out that
> depending on the volume of traffic (i.e., time of day) in fact some of
> our abortable-preclean phases are aborted due to time limit (5sec).
> Despite that most of the following remark phases times still are
> within acceptable limit (up to 100ms). So we kept digging. As a result
> of that we found out that the abnormal long remark phases are preceded
> by aborted abortable-preclean phase. The phase was always aborted due
> to the time limit however if we have a look at the following report
> for the young generation occupancy in all cases we were able to find
> that YG was occupied in far more than 50%.
> Per my (current :)) understanding the abortable-preclean phase can be
> aborted due of the time limit or because YG got full in about 50% (so
> remark phase will happen midway during two minor collections) -
> whatever comes first. In our case the 'about 50%' condition is not
> executed and the phase continues until it hits the time limit. The
> following remark phase always last longer, i.e., 350-550ms.
>
>
> The big question:
>
> What can we do to cut down the time of those long lasting remark phases?
>
>
> Below I enclose three samples from our GC log presenting:
> first one - a CMS cycle that aborted the abortable-preclean phase due
> time limit and the following remark phase does not show the abnormal
> behavior.
> second one - an "ideal" CMS cycle
> third one - a CMS cycle with aborted the abortable-preclean phase (due
> to time limit even though YG occupancy is much greater than 50%) and
> the following remark phase lasts for 0.5second.
>
> --
> 1142110.458: [GC 1142110.458: [ParNew: 888646K->45370K(943744K),
> 0.0728880 secs] 1852227K->1013124K(1992320K), 0.0739250 secs] [Times:
> user=0.33 sys=0.01, real=0.07 secs]
> 1142110.547: [GC [1 CMS-initial-mark: 967753K(1048576K)]
> 1013331K(1992320K), 0.0540170 secs] [Times: user=0.06 sys=0.00,
> real=0.05 secs]
> 1142110.602: [CMS-concurrent-mark-start]
> 1142111.010: [CMS-concurrent-mark: 0.408/0.408 secs] [Times: user=1.96
> sys=0.07, real=0.41 secs]
> 1142111.011: [CMS-concurrent-preclean-start]
> 1142111.028: [CMS-concurrent-preclean: 0.016/0.017 secs] [Times:
> user=0.02 sys=0.00, real=0.02 secs]
> 1142111.028: [CMS-concurrent-abortable-preclean-start]
>   CMS: abort preclean due to time 1142116.036:
> [CMS-concurrent-abortable-preclean: 4.858/5.007 secs] [Times:
> user=7.31 sys=0.57, real=5.00 secs]
> 1142116.050: [GC[YG occupancy: 409639 K (943744 K)]1142116.051:
> [Rescan (parallel) , 0.0389910 secs]1142116.090: [weak refs
> processing, 0.0156130 secs] [1 CMS-remark: 967753K(1048576K)]
> 1377393K(1992320K), 0.0554700 secs] [Times: user=0.50 sys=0.00,
> real=0.06 secs]
> 1142116.107: [CMS-concurrent-sweep-start]
> 1142117.721: [CMS-concurrent-sweep: 1.614/1.614 secs] [Times:
> user=2.41 sys=0.24, real=1.61 secs]
> 1142117.721: [CMS-concurrent-reset-start]
> 1142117.732: [CMS-concurrent-reset: 0.010/0.010 secs] [Times:
> user=0.01 sys=0.00, real=0.01 secs]
> 1142121.278: [GC 1142121.279: [ParNew: 884282K->52652K(943744K),
> 0.0680850 secs] 1200273K->372087K(1992320K), 0.0690040 secs] [Times:
> user=0.29 sys=0.01, real=0.07 secs]
> 1142133.508: [GC 1142133.508: [ParNew: 891564K->47435K(943744K),
> 0.0682080 secs] 1210999K->370280K(1992320K), 0.0691030 secs] [Times:
> user=0.29 sys=0.01, real=0.07 secs]
> --
> 1165584.305: [GC 1165584.305: [ParNew: 896212K->59055K(943744K),
> 0.0761290 secs] 1857148K->1023947K(1992320K), 0.0771330 secs] [Times:
> user=0.33 sys=0.00, real=0.08 secs]
> 1165584.398: [GC [1 CMS-initial-mark: 964891K(1048576K)]
> 1024053K(1992320K), 0.0631010 secs] [Times: user=0.06 sys=0.00,
> real=0.06 secs]
> 1165584.463: [CMS-concurrent-mark-start]
> 1165584.933: [CMS-concurrent-mark: 0.423/0.471 secs] [Times: user=2.40
> sys=0.21, real=0.47 secs]
> 1165584.934: [CMS-concurrent-preclean-start]
> 1165584.954: [CMS-concurrent-preclean: 0.018/0.021 secs] [Times:
> user=0.05 sys=0.00, real=0.02 secs]
> 1165584.955: [CMS-concurrent-abortable-preclean-start]
> 1165587.876: [CMS-concurrent-abortable-preclean: 2.884/2.921 secs]
> [Times: user=5.51 sys=0.65, real=2.92 secs]
> 1165587.892: [GC[YG occupancy: 479051 K (943744 K)]1165587.892:
> [Rescan (parallel) , 0.0746810 secs]1165587.967: [weak refs
> processing, 0.0168870 secs] [1 CMS-remark: 964891K(1048576K)]
> 1443943K(1992320K), 0.0925600 secs] [Times: user=0.91 sys=0.01,
> real=0.09 secs]
> 1165587.986: [CMS-concurrent-sweep-start]
> 1165589.670: [CMS-concurrent-sweep: 1.684/1.684 secs] [Times:
> user=3.39 sys=0.46, real=1.69 secs]
> 1165589.671: [CMS-concurrent-reset-start]
> 1165589.679: [CMS-concurrent-reset: 0.009/0.009 secs] [Times:
> user=0.01 sys=0.00, real=0.01 secs]
> 1165591.354: [GC 1165591.354: [ParNew: 897967K->54984K(943744K),
> 0.0862910 secs] 1236513K->397404K(1992320K), 0.0872930 secs] [Times:
> user=0.34 sys=0.00, real=0.09 secs]
> 1165598.887: [GC 1165598.888: [ParNew: 893896K->52086K(943744K),
> 0.0885510 secs] 1236316K->398587K(1992320K), 0.0895820 secs] [Times:
> user=0.31 sys=0.01, real=0.09 secs]
> --
> 1166753.770: [GC 1166753.770: [ParNew: 899148K->57315K(943744K),
> 0.0782510 secs] 1862058K->1024198K(1992320K), 0.0793040 secs] [Times:
> user=0.32 sys=0.01, real=0.08 secs]
> 1166753.867: [GC [1 CMS-initial-mark: 966883K(1048576K)]
> 1024305K(1992320K), 0.0642680 secs] [Times: user=0.07 sys=0.00,
> real=0.07 secs]
> 1166753.932: [CMS-concurrent-mark-start]
> 1166754.471: [CMS-concurrent-mark: 0.486/0.538 secs] [Times: user=2.76
> sys=0.28, real=0.54 secs]
> 1166754.471: [CMS-concurrent-preclean-start]
> 1166754.488: [CMS-concurrent-preclean: 0.015/0.017 secs] [Times:
> user=0.04 sys=0.00, real=0.01 secs]
> 1166754.488: [CMS-concurrent-abortable-preclean-start]
>   CMS: abort preclean due to time 1166759.533:
> [CMS-concurrent-abortable-preclean: 4.895/5.044 secs] [Times:
> user=9.75 sys=1.21, real=5.05 secs]
> 1166759.549: [GC[YG occupancy: 791197 K (943744 K)]1166759.549:
> [Rescan (parallel) , 0.5387660 secs]1166760.088: [weak refs
> processing, 0.0139780 secs] [1 CMS-remark: 966883K(1048576K)]
> 1758080K(1992320K), 0.5537750 secs] [Times: user=5.58 sys=0.06,
> real=0.56 secs]
> 1166760.105: [CMS-concurrent-sweep-start]
> 1166760.688: [GC 1166760.689: [ParNew: 896188K->57161K(943744K),
> 0.0727850 secs] 1623884K->788963K(1992320K), 0.0737390 secs] [Times:
> user=0.31 sys=0.02, real=0.08 secs]
> 1166761.593: [CMS-concurrent-sweep: 1.363/1.488 secs] [Times:
> user=3.48 sys=0.49, real=1.49 secs]
> 1166761.593: [CMS-concurrent-reset-start]
> 1166761.602: [CMS-concurrent-reset: 0.009/0.009 secs] [Times:
> user=0.02 sys=0.01, real=0.01 secs]
> 1166767.947: [GC 1166767.948: [ParNew: 896053K->58188K(943744K),
> 0.0817680 secs] 1238926K->404605K(1992320K), 0.0828270 secs] [Times:
> user=0.31 sys=0.01, real=0.08 secs]
> --
>
> Thank you in advance,
> Bartek
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use



More information about the hotspot-gc-dev mailing list