Promotion failures: indication of CMS fragmentation?

Taras Tielkes taras.tielkes at gmail.com
Tue Dec 27 05:07:06 PST 2011


Hi,

We're running an application with the CMS/ParNew collectors that is
experiencing occasional promotion failures.
Environment is Linux 2.6.18 (x64), JVM is 1.6.0_29 in server mode.
I've listed the specific JVM options used below (a).

The application is deployed across a handful of machines, and the
promotion failures are fairly uniform across those.

The first kind of failure we observe is a promotion failure during
ParNew collection, I've included a snipped from the gc log below (b).
The second kind of failure is a concurrrent mode failure (perhaps
triggered by the same cause), see (c) below.
The frequency (after running for a some weeks) is approximately once
per day. This is bearable, but obviously we'd like to improve on this.

Apart from high-volume request handling (which allocates a lot of
small objects), the application also runs a few dozen background
threads that download and process XML documents, typically in the 5-30
MB range.
A known deficiency in the existing code is that the XML content is
copied twice before processing (once to a byte[], and later again to a
String/char[]).
Given that a 30 MB XML stream will result in a 60 MB
java.lang.String/char[], my suspicion is that these big array
allocations are causing us to run into the CMS fragmentation issue.

My questions are:
1) Does the data from the GC logs provide sufficient evidence to
conclude that CMS fragmentation is the cause of the promotion failure?
2) If not, what's the next step of investigating the cause?
3) We're planning to at least add -XX:+PrintPromotionFailure to get a
feeling for the size of the objects that fail promotion.
Overall, it seem that -XX:PrintFLSStatistics=1 is actually the only
reliable approach to diagnose CMS fragmentation. Is this indeed the
case?

Thanks in advance,
Taras

a) Current JVM options:
--------------------------------
-server
-Xms5g
-Xmx5g
-Xmn400m
-XX:PermSize=256m
-XX:MaxPermSize=256m
-XX:+PrintGCTimeStamps
-verbose:gc
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:SurvivorRatio=8
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+DisableExplicitGC
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+CMSClassUnloadingEnabled
-XX:+CMSScavengeBeforeRemark
-XX:CMSInitiatingOccupancyFraction=68
-Xloggc:gc.log
--------------------------------

b) Promotion failure during ParNew
--------------------------------
2011-12-08T18:14:40.966+0100: 219729.868: [GC 219729.868: [ParNew:
368640K->40959K(368640K), 0.0693460 secs]
3504917K->3195098K(5201920K), 0.0696500 secs] [Times: user=0.39
sys=0.01, real=0.07 secs]
2011-12-08T18:14:43.778+0100: 219732.679: [GC 219732.679: [ParNew:
368639K->31321K(368640K), 0.0511400 secs]
3522778K->3198316K(5201920K), 0.0514420 secs] [Times: user=0.28
sys=0.00, real=0.05 secs]
2011-12-08T18:14:46.945+0100: 219735.846: [GC 219735.846: [ParNew:
359001K->18694K(368640K), 0.0272970 secs]
3525996K->3185690K(5201920K), 0.0276080 secs] [Times: user=0.19
sys=0.00, real=0.03 secs]
2011-12-08T18:14:49.036+0100: 219737.938: [GC 219737.938: [ParNew
(promotion failed): 338813K->361078K(368640K), 0.1321200
secs]219738.070: [CMS: 3167747K->434291K(4833280K), 4.8881570 secs]
3505808K->434291K
(5201920K), [CMS Perm : 116893K->116883K(262144K)], 5.0206620 secs]
[Times: user=5.24 sys=0.00, real=5.02 secs]
2011-12-08T18:14:54.721+0100: 219743.622: [GC 219743.623: [ParNew:
327680K->40960K(368640K), 0.0949460 secs] 761971K->514584K(5201920K),
0.0952820 secs] [Times: user=0.52 sys=0.04, real=0.10 secs]
2011-12-08T18:14:55.580+0100: 219744.481: [GC 219744.482: [ParNew:
368640K->40960K(368640K), 0.1299190 secs] 842264K->625681K(5201920K),
0.1302190 secs] [Times: user=0.72 sys=0.01, real=0.13 secs]
2011-12-08T18:14:58.050+0100: 219746.952: [GC 219746.952: [ParNew:
368640K->40960K(368640K), 0.0870940 secs] 953361K->684121K(5201920K),
0.0874110 secs] [Times: user=0.48 sys=0.01, real=0.09 secs]
--------------------------------

c) Promotion failure during CMS
--------------------------------
2011-12-14T08:29:26.628+0100: 703015.530: [GC 703015.530: [ParNew:
357228K->40960K(368640K), 0.0525110 secs]
3603068K->3312743K(5201920K), 0.0528120 secs] [Times: user=0.37
sys=0.00, real=0.05 secs]
2011-12-14T08:29:28.864+0100: 703017.766: [GC 703017.766: [ParNew:
366075K->37119K(368640K), 0.0479780 secs]
3637859K->3317662K(5201920K), 0.0483090 secs] [Times: user=0.24
sys=0.01, real=0.05 secs]
2011-12-14T08:29:29.553+0100: 703018.454: [GC 703018.455: [ParNew:
364792K->40960K(368640K), 0.0421740 secs]
3645334K->3334944K(5201920K), 0.0424810 secs] [Times: user=0.30
sys=0.00, real=0.04 secs]
2011-12-14T08:29:29.600+0100: 703018.502: [GC [1 CMS-initial-mark:
3293984K(4833280K)] 3335025K(5201920K), 0.0272490 secs] [Times:
user=0.02 sys=0.00, real=0.03 secs]
2011-12-14T08:29:29.628+0100: 703018.529: [CMS-concurrent-mark-start]
2011-12-14T08:29:30.718+0100: 703019.620: [GC 703019.620: [ParNew:
368640K->40960K(368640K), 0.0836690 secs]
3662624K->3386039K(5201920K), 0.0839690 secs] [Times: user=0.50
sys=0.01, real=0.08 secs]
2011-12-14T08:29:30.827+0100: 703019.729: [CMS-concurrent-mark:
1.108/1.200 secs] [Times: user=6.83 sys=0.23, real=1.20 secs]
2011-12-14T08:29:30.827+0100: 703019.729: [CMS-concurrent-preclean-start]
2011-12-14T08:29:30.938+0100: 703019.840: [CMS-concurrent-preclean:
0.093/0.111 secs] [Times: user=0.48 sys=0.02, real=0.11 secs]
2011-12-14T08:29:30.938+0100: 703019.840:
[CMS-concurrent-abortable-preclean-start]
2011-12-14T08:29:32.337+0100: 703021.239:
[CMS-concurrent-abortable-preclean: 1.383/1.399 secs] [Times:
user=6.68 sys=0.27, real=1.40 secs]
2011-12-14T08:29:32.343+0100: 703021.244: [GC[YG occupancy: 347750 K
(368640 K)]2011-12-14T08:29:32.343+0100: 703021.244: [GC 703021.244:
[ParNew (promotion failed): 347750K->347750K(368640K), 9.8729020 secs]
 3692829K->3718580K(5201920K), 9.8732380 secs] [Times: user=12.00
sys=2.58, real=9.88 secs]
703031.118: [Rescan (parallel) , 0.2826110 secs]703031.400: [weak refs
processing, 0.0014780 secs]703031.402: [class unloading, 0.0176610
secs]703031.419: [scrub symbol & string tables, 0.0094960 secs] [1 CMS
-remark: 3370830K(4833280K)] 3718580K(5201920K), 10.1916910 secs]
[Times: user=13.73 sys=2.59, real=10.19 secs]
2011-12-14T08:29:42.535+0100: 703031.436: [CMS-concurrent-sweep-start]
2011-12-14T08:29:42.591+0100: 703031.493: [Full GC 703031.493:
[CMS2011-12-14T08:29:48.616+0100: 703037.518: [CMS-concurrent-sweep:
6.046/6.082 secs] [Times: user=6.18 sys=0.01, real=6.09 secs]
 (concurrent mode failure): 3370829K->433437K(4833280K), 10.9594300
secs] 3739469K->433437K(5201920K), [CMS Perm :
121702K->121690K(262144K)], 10.9597540 secs] [Times: user=10.95
sys=0.00, real=10.96 secs]
2011-12-14T08:29:53.997+0100: 703042.899: [GC 703042.899: [ParNew:
327680K->40960K(368640K), 0.0799960 secs] 761117K->517836K(5201920K),
0.0804100 secs] [Times: user=0.46 sys=0.00, real=0.08 secs]
2011-12-14T08:29:54.649+0100: 703043.551: [GC 703043.551: [ParNew:
368640K->40960K(368640K), 0.0784460 secs] 845516K->557872K(5201920K),
0.0787920 secs] [Times: user=0.40 sys=0.01, real=0.08 secs]
2011-12-14T08:29:56.418+0100: 703045.320: [GC 703045.320: [ParNew:
368640K->40960K(368640K), 0.0784040 secs] 885552K->603017K(5201920K),
0.0787630 secs] [Times: user=0.41 sys=0.01, real=0.07 secs]
--------------------------------


More information about the hotspot-gc-use mailing list