Promotion failures: indication of CMS fragmentation?

Jon Masamitsu jon.masamitsu at oracle.com
Tue Dec 27 17:13:16 UTC 2011


Taras,

PrintPromotionFailure seems like it would go a long
way to identify the root of your promotion failures (or
at least eliminating some possible causes).    I think it
would help focus the discussion if you could send
the result of that experiment early.

Jon

On 12/27/2011 5:07 AM, Taras Tielkes wrote:
> Hi,
>
> We're running an application with the CMS/ParNew collectors that is
> experiencing occasional promotion failures.
> Environment is Linux 2.6.18 (x64), JVM is 1.6.0_29 in server mode.
> I've listed the specific JVM options used below (a).
>
> The application is deployed across a handful of machines, and the
> promotion failures are fairly uniform across those.
>
> The first kind of failure we observe is a promotion failure during
> ParNew collection, I've included a snipped from the gc log below (b).
> The second kind of failure is a concurrrent mode failure (perhaps
> triggered by the same cause), see (c) below.
> The frequency (after running for a some weeks) is approximately once
> per day. This is bearable, but obviously we'd like to improve on this.
>
> Apart from high-volume request handling (which allocates a lot of
> small objects), the application also runs a few dozen background
> threads that download and process XML documents, typically in the 5-30
> MB range.
> A known deficiency in the existing code is that the XML content is
> copied twice before processing (once to a byte[], and later again to a
> String/char[]).
> Given that a 30 MB XML stream will result in a 60 MB
> java.lang.String/char[], my suspicion is that these big array
> allocations are causing us to run into the CMS fragmentation issue.
>
> My questions are:
> 1) Does the data from the GC logs provide sufficient evidence to
> conclude that CMS fragmentation is the cause of the promotion failure?
> 2) If not, what's the next step of investigating the cause?
> 3) We're planning to at least add -XX:+PrintPromotionFailure to get a
> feeling for the size of the objects that fail promotion.
> Overall, it seem that -XX:PrintFLSStatistics=1 is actually the only
> reliable approach to diagnose CMS fragmentation. Is this indeed the
> case?
>
> Thanks in advance,
> Taras
>
> a) Current JVM options:
> --------------------------------
> -server
> -Xms5g
> -Xmx5g
> -Xmn400m
> -XX:PermSize=256m
> -XX:MaxPermSize=256m
> -XX:+PrintGCTimeStamps
> -verbose:gc
> -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails
> -XX:SurvivorRatio=8
> -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC
> -XX:+DisableExplicitGC
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+CMSClassUnloadingEnabled
> -XX:+CMSScavengeBeforeRemark
> -XX:CMSInitiatingOccupancyFraction=68
> -Xloggc:gc.log
> --------------------------------
>
> b) Promotion failure during ParNew
> --------------------------------
> 2011-12-08T18:14:40.966+0100: 219729.868: [GC 219729.868: [ParNew:
> 368640K->40959K(368640K), 0.0693460 secs]
> 3504917K->3195098K(5201920K), 0.0696500 secs] [Times: user=0.39
> sys=0.01, real=0.07 secs]
> 2011-12-08T18:14:43.778+0100: 219732.679: [GC 219732.679: [ParNew:
> 368639K->31321K(368640K), 0.0511400 secs]
> 3522778K->3198316K(5201920K), 0.0514420 secs] [Times: user=0.28
> sys=0.00, real=0.05 secs]
> 2011-12-08T18:14:46.945+0100: 219735.846: [GC 219735.846: [ParNew:
> 359001K->18694K(368640K), 0.0272970 secs]
> 3525996K->3185690K(5201920K), 0.0276080 secs] [Times: user=0.19
> sys=0.00, real=0.03 secs]
> 2011-12-08T18:14:49.036+0100: 219737.938: [GC 219737.938: [ParNew
> (promotion failed): 338813K->361078K(368640K), 0.1321200
> secs]219738.070: [CMS: 3167747K->434291K(4833280K), 4.8881570 secs]
> 3505808K->434291K
> (5201920K), [CMS Perm : 116893K->116883K(262144K)], 5.0206620 secs]
> [Times: user=5.24 sys=0.00, real=5.02 secs]
> 2011-12-08T18:14:54.721+0100: 219743.622: [GC 219743.623: [ParNew:
> 327680K->40960K(368640K), 0.0949460 secs] 761971K->514584K(5201920K),
> 0.0952820 secs] [Times: user=0.52 sys=0.04, real=0.10 secs]
> 2011-12-08T18:14:55.580+0100: 219744.481: [GC 219744.482: [ParNew:
> 368640K->40960K(368640K), 0.1299190 secs] 842264K->625681K(5201920K),
> 0.1302190 secs] [Times: user=0.72 sys=0.01, real=0.13 secs]
> 2011-12-08T18:14:58.050+0100: 219746.952: [GC 219746.952: [ParNew:
> 368640K->40960K(368640K), 0.0870940 secs] 953361K->684121K(5201920K),
> 0.0874110 secs] [Times: user=0.48 sys=0.01, real=0.09 secs]
> --------------------------------
>
> c) Promotion failure during CMS
> --------------------------------
> 2011-12-14T08:29:26.628+0100: 703015.530: [GC 703015.530: [ParNew:
> 357228K->40960K(368640K), 0.0525110 secs]
> 3603068K->3312743K(5201920K), 0.0528120 secs] [Times: user=0.37
> sys=0.00, real=0.05 secs]
> 2011-12-14T08:29:28.864+0100: 703017.766: [GC 703017.766: [ParNew:
> 366075K->37119K(368640K), 0.0479780 secs]
> 3637859K->3317662K(5201920K), 0.0483090 secs] [Times: user=0.24
> sys=0.01, real=0.05 secs]
> 2011-12-14T08:29:29.553+0100: 703018.454: [GC 703018.455: [ParNew:
> 364792K->40960K(368640K), 0.0421740 secs]
> 3645334K->3334944K(5201920K), 0.0424810 secs] [Times: user=0.30
> sys=0.00, real=0.04 secs]
> 2011-12-14T08:29:29.600+0100: 703018.502: [GC [1 CMS-initial-mark:
> 3293984K(4833280K)] 3335025K(5201920K), 0.0272490 secs] [Times:
> user=0.02 sys=0.00, real=0.03 secs]
> 2011-12-14T08:29:29.628+0100: 703018.529: [CMS-concurrent-mark-start]
> 2011-12-14T08:29:30.718+0100: 703019.620: [GC 703019.620: [ParNew:
> 368640K->40960K(368640K), 0.0836690 secs]
> 3662624K->3386039K(5201920K), 0.0839690 secs] [Times: user=0.50
> sys=0.01, real=0.08 secs]
> 2011-12-14T08:29:30.827+0100: 703019.729: [CMS-concurrent-mark:
> 1.108/1.200 secs] [Times: user=6.83 sys=0.23, real=1.20 secs]
> 2011-12-14T08:29:30.827+0100: 703019.729: [CMS-concurrent-preclean-start]
> 2011-12-14T08:29:30.938+0100: 703019.840: [CMS-concurrent-preclean:
> 0.093/0.111 secs] [Times: user=0.48 sys=0.02, real=0.11 secs]
> 2011-12-14T08:29:30.938+0100: 703019.840:
> [CMS-concurrent-abortable-preclean-start]
> 2011-12-14T08:29:32.337+0100: 703021.239:
> [CMS-concurrent-abortable-preclean: 1.383/1.399 secs] [Times:
> user=6.68 sys=0.27, real=1.40 secs]
> 2011-12-14T08:29:32.343+0100: 703021.244: [GC[YG occupancy: 347750 K
> (368640 K)]2011-12-14T08:29:32.343+0100: 703021.244: [GC 703021.244:
> [ParNew (promotion failed): 347750K->347750K(368640K), 9.8729020 secs]
>   3692829K->3718580K(5201920K), 9.8732380 secs] [Times: user=12.00
> sys=2.58, real=9.88 secs]
> 703031.118: [Rescan (parallel) , 0.2826110 secs]703031.400: [weak refs
> processing, 0.0014780 secs]703031.402: [class unloading, 0.0176610
> secs]703031.419: [scrub symbol&  string tables, 0.0094960 secs] [1 CMS
> -remark: 3370830K(4833280K)] 3718580K(5201920K), 10.1916910 secs]
> [Times: user=13.73 sys=2.59, real=10.19 secs]
> 2011-12-14T08:29:42.535+0100: 703031.436: [CMS-concurrent-sweep-start]
> 2011-12-14T08:29:42.591+0100: 703031.493: [Full GC 703031.493:
> [CMS2011-12-14T08:29:48.616+0100: 703037.518: [CMS-concurrent-sweep:
> 6.046/6.082 secs] [Times: user=6.18 sys=0.01, real=6.09 secs]
>   (concurrent mode failure): 3370829K->433437K(4833280K), 10.9594300
> secs] 3739469K->433437K(5201920K), [CMS Perm :
> 121702K->121690K(262144K)], 10.9597540 secs] [Times: user=10.95
> sys=0.00, real=10.96 secs]
> 2011-12-14T08:29:53.997+0100: 703042.899: [GC 703042.899: [ParNew:
> 327680K->40960K(368640K), 0.0799960 secs] 761117K->517836K(5201920K),
> 0.0804100 secs] [Times: user=0.46 sys=0.00, real=0.08 secs]
> 2011-12-14T08:29:54.649+0100: 703043.551: [GC 703043.551: [ParNew:
> 368640K->40960K(368640K), 0.0784460 secs] 845516K->557872K(5201920K),
> 0.0787920 secs] [Times: user=0.40 sys=0.01, real=0.08 secs]
> 2011-12-14T08:29:56.418+0100: 703045.320: [GC 703045.320: [ParNew:
> 368640K->40960K(368640K), 0.0784040 secs] 885552K->603017K(5201920K),
> 0.0787630 secs] [Times: user=0.41 sys=0.01, real=0.07 secs]
> --------------------------------
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use



More information about the hotspot-gc-dev mailing list