Why abortable-preclean phase is not being aborted after YG occupancy exceeds 50%?

Bartek Markocki bartosz.markocki at gmail.com
Fri Apr 1 09:16:31 UTC 2011


Hi all,

Can I ask any of you to review the attached extracts from our
production GC log and share your thoughts about them?

We have a router-type web application running under tomcat 6.0.28 with
Java 1.6.0_21 (64bit) on RHEL 5.2 (2.6.18-92.1.22.el5). The GC
settings are:
-Xmx2048m -Xms2048m -XX:NewSize=1024m
-XX:PermSize=64m -XX:MaxPermSize=128m
-XX:ThreadStackSize=128
-XX:+DisableExplicitGC
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+PrintGCDetails

What we did:

Lately we have changed from ParallelOld to CMS due to unacceptable
long Full GC pauses times. In preparation to the change of the
collector we performed a lot of GC tuning related tests and found out
that the above (simple) set of settings fulfill our needs in the best
way.
So far we are happy with what we see (frequency of minor scans/CMS
cycles, times of STW pauses) with one exception.


What is the problem:

Some of our remark phases last much longer than others (up to 8 times
on avg.). Normal remark phase lasts between 55 and 90ms, the longest
one lasted for 538ms.
At first we thought that this is due to aborting the preceding
abortable-preclean phase. After a closer look we found out that
depending on the volume of traffic (i.e., time of day) in fact some of
our abortable-preclean phases are aborted due to time limit (5sec).
Despite that most of the following remark phases times still are
within acceptable limit (up to 100ms). So we kept digging. As a result
of that we found out that the abnormal long remark phases are preceded
by aborted abortable-preclean phase. The phase was always aborted due
to the time limit however if we have a look at the following report
for the young generation occupancy in all cases we were able to find
that YG was occupied in far more than 50%.
Per my (current :)) understanding the abortable-preclean phase can be
aborted due of the time limit or because YG got full in about 50% (so
remark phase will happen midway during two minor collections) -
whatever comes first. In our case the 'about 50%' condition is not
executed and the phase continues until it hits the time limit. The
following remark phase always last longer, i.e., 350-550ms.


The big question:

What can we do to cut down the time of those long lasting remark phases?


Below I enclose three samples from our GC log presenting:
first one - a CMS cycle that aborted the abortable-preclean phase due
time limit and the following remark phase does not show the abnormal
behavior.
second one - an "ideal" CMS cycle
third one - a CMS cycle with aborted the abortable-preclean phase (due
to time limit even though YG occupancy is much greater than 50%) and
the following remark phase lasts for 0.5second.

--
1142110.458: [GC 1142110.458: [ParNew: 888646K->45370K(943744K),
0.0728880 secs] 1852227K->1013124K(1992320K), 0.0739250 secs] [Times:
user=0.33 sys=0.01, real=0.07 secs]
1142110.547: [GC [1 CMS-initial-mark: 967753K(1048576K)]
1013331K(1992320K), 0.0540170 secs] [Times: user=0.06 sys=0.00,
real=0.05 secs]
1142110.602: [CMS-concurrent-mark-start]
1142111.010: [CMS-concurrent-mark: 0.408/0.408 secs] [Times: user=1.96
sys=0.07, real=0.41 secs]
1142111.011: [CMS-concurrent-preclean-start]
1142111.028: [CMS-concurrent-preclean: 0.016/0.017 secs] [Times:
user=0.02 sys=0.00, real=0.02 secs]
1142111.028: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 1142116.036:
[CMS-concurrent-abortable-preclean: 4.858/5.007 secs] [Times:
user=7.31 sys=0.57, real=5.00 secs]
1142116.050: [GC[YG occupancy: 409639 K (943744 K)]1142116.051:
[Rescan (parallel) , 0.0389910 secs]1142116.090: [weak refs
processing, 0.0156130 secs] [1 CMS-remark: 967753K(1048576K)]
1377393K(1992320K), 0.0554700 secs] [Times: user=0.50 sys=0.00,
real=0.06 secs]
1142116.107: [CMS-concurrent-sweep-start]
1142117.721: [CMS-concurrent-sweep: 1.614/1.614 secs] [Times:
user=2.41 sys=0.24, real=1.61 secs]
1142117.721: [CMS-concurrent-reset-start]
1142117.732: [CMS-concurrent-reset: 0.010/0.010 secs] [Times:
user=0.01 sys=0.00, real=0.01 secs]
1142121.278: [GC 1142121.279: [ParNew: 884282K->52652K(943744K),
0.0680850 secs] 1200273K->372087K(1992320K), 0.0690040 secs] [Times:
user=0.29 sys=0.01, real=0.07 secs]
1142133.508: [GC 1142133.508: [ParNew: 891564K->47435K(943744K),
0.0682080 secs] 1210999K->370280K(1992320K), 0.0691030 secs] [Times:
user=0.29 sys=0.01, real=0.07 secs]
--
1165584.305: [GC 1165584.305: [ParNew: 896212K->59055K(943744K),
0.0761290 secs] 1857148K->1023947K(1992320K), 0.0771330 secs] [Times:
user=0.33 sys=0.00, real=0.08 secs]
1165584.398: [GC [1 CMS-initial-mark: 964891K(1048576K)]
1024053K(1992320K), 0.0631010 secs] [Times: user=0.06 sys=0.00,
real=0.06 secs]
1165584.463: [CMS-concurrent-mark-start]
1165584.933: [CMS-concurrent-mark: 0.423/0.471 secs] [Times: user=2.40
sys=0.21, real=0.47 secs]
1165584.934: [CMS-concurrent-preclean-start]
1165584.954: [CMS-concurrent-preclean: 0.018/0.021 secs] [Times:
user=0.05 sys=0.00, real=0.02 secs]
1165584.955: [CMS-concurrent-abortable-preclean-start]
1165587.876: [CMS-concurrent-abortable-preclean: 2.884/2.921 secs]
[Times: user=5.51 sys=0.65, real=2.92 secs]
1165587.892: [GC[YG occupancy: 479051 K (943744 K)]1165587.892:
[Rescan (parallel) , 0.0746810 secs]1165587.967: [weak refs
processing, 0.0168870 secs] [1 CMS-remark: 964891K(1048576K)]
1443943K(1992320K), 0.0925600 secs] [Times: user=0.91 sys=0.01,
real=0.09 secs]
1165587.986: [CMS-concurrent-sweep-start]
1165589.670: [CMS-concurrent-sweep: 1.684/1.684 secs] [Times:
user=3.39 sys=0.46, real=1.69 secs]
1165589.671: [CMS-concurrent-reset-start]
1165589.679: [CMS-concurrent-reset: 0.009/0.009 secs] [Times:
user=0.01 sys=0.00, real=0.01 secs]
1165591.354: [GC 1165591.354: [ParNew: 897967K->54984K(943744K),
0.0862910 secs] 1236513K->397404K(1992320K), 0.0872930 secs] [Times:
user=0.34 sys=0.00, real=0.09 secs]
1165598.887: [GC 1165598.888: [ParNew: 893896K->52086K(943744K),
0.0885510 secs] 1236316K->398587K(1992320K), 0.0895820 secs] [Times:
user=0.31 sys=0.01, real=0.09 secs]
--
1166753.770: [GC 1166753.770: [ParNew: 899148K->57315K(943744K),
0.0782510 secs] 1862058K->1024198K(1992320K), 0.0793040 secs] [Times:
user=0.32 sys=0.01, real=0.08 secs]
1166753.867: [GC [1 CMS-initial-mark: 966883K(1048576K)]
1024305K(1992320K), 0.0642680 secs] [Times: user=0.07 sys=0.00,
real=0.07 secs]
1166753.932: [CMS-concurrent-mark-start]
1166754.471: [CMS-concurrent-mark: 0.486/0.538 secs] [Times: user=2.76
sys=0.28, real=0.54 secs]
1166754.471: [CMS-concurrent-preclean-start]
1166754.488: [CMS-concurrent-preclean: 0.015/0.017 secs] [Times:
user=0.04 sys=0.00, real=0.01 secs]
1166754.488: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 1166759.533:
[CMS-concurrent-abortable-preclean: 4.895/5.044 secs] [Times:
user=9.75 sys=1.21, real=5.05 secs]
1166759.549: [GC[YG occupancy: 791197 K (943744 K)]1166759.549:
[Rescan (parallel) , 0.5387660 secs]1166760.088: [weak refs
processing, 0.0139780 secs] [1 CMS-remark: 966883K(1048576K)]
1758080K(1992320K), 0.5537750 secs] [Times: user=5.58 sys=0.06,
real=0.56 secs]
1166760.105: [CMS-concurrent-sweep-start]
1166760.688: [GC 1166760.689: [ParNew: 896188K->57161K(943744K),
0.0727850 secs] 1623884K->788963K(1992320K), 0.0737390 secs] [Times:
user=0.31 sys=0.02, real=0.08 secs]
1166761.593: [CMS-concurrent-sweep: 1.363/1.488 secs] [Times:
user=3.48 sys=0.49, real=1.49 secs]
1166761.593: [CMS-concurrent-reset-start]
1166761.602: [CMS-concurrent-reset: 0.009/0.009 secs] [Times:
user=0.02 sys=0.01, real=0.01 secs]
1166767.947: [GC 1166767.948: [ParNew: 896053K->58188K(943744K),
0.0817680 secs] 1238926K->404605K(1992320K), 0.0828270 secs] [Times:
user=0.31 sys=0.01, real=0.08 secs]
--

Thank you in advance,
Bartek
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use



More information about the hotspot-gc-dev mailing list