Unreasonably long ParNew, CMS pause times

Matt Fowles matt.fowles at gmail.com
Wed Jul 21 11:09:02 PDT 2010


Eagle~

What JVM settings are you using?  Can you attach a log with
-XX:+PrintTenuringDistribution as well?

Matt

On Wed, Jul 21, 2010 at 1:09 PM, Eagle Kessler <eagle.kessler at gmail.com>wrote:

> Hello all,
>   I've got a web service that I'm seeing some very strange behavior on
> around it's garbage collection, and was wondering if someone here could
> explain why it might be happening. The service itself is fairly simple: Take
> in data from a few sources and merge them with the existing data in the
> database. It stores nearly no state while doing this, and indeed heap dumps
> taken 1, 24, and 72 hours after boot indicate that we have a consistent
> ~12mb of live data (in a 2GB heap, but I hope that's not what is causing
> this).
>
> The GC logs, though, don't look very happy at all. After our start up
> period, they settle into a fairly consistent pattern:
>
> 1041.159: [GC 1041.159: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>
>
> : 516224K->0K(520256K), 0.0170322 secs] 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04 sys=0.01, real=0.02 secs]
> 1606.500: [GC 1606.500: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>
>
> : 516224K->0K(520256K), 0.0173235 secs] 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04 sys=0.03, real=0.02 secs]
> 2040.773: [GC 2040.773: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>
>
> : 516224K->0K(520256K), 0.0196319 secs] 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05 sys=0.02, real=0.02 secs]
>
> Which we would be very nice if it kept going like that. However, by the
> first time the CMS collector runs, things aren't working nearly as well:
>
> 214182.439: [GC 214182.439: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> : 516224K->0K(520256K), 1.0146996 secs] 1297278K->782575K(2093120K), 1.0148799 secs] [Times: user=1.21 sys=0.58, real=1.01 secs]
>
>
> 214247.437: [GC 214247.438: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> : 516224K->0K(520256K), 1.2310274 secs] 1298799K->784188K(2093120K), 1.2311534 secs] [Times: user=1.46 sys=0.69, real=1.23 secs]
>
>
> 214313.642: [GC 214313.642: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> : 516224K->0K(520256K), 1.2183258 secs] 1300412K->785710K(2093120K), 1.2184848 secs] [Times: user=1.45 sys=0.65, real=1.22 secs]
>
> The increasing sys time is a bit worrying, but it seems like the actual GC
> time is rising as well, even though we aren't collecting any more young-gen
> garbage. At this point, CMS went off
>
> 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 sys=0.02, real=1.89 secs]
>
>
> 214382.589: [CMS-concurrent-mark-start]
> 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: user=1.81 sys=0.01, real=0.47 secs]
> 214383.056: [CMS-concurrent-preclean-start]
> 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
>
>
> 214383.064: [CMS-concurrent-abortable-preclean-start]
>  CMS: abort preclean due to time 214388.133: [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] [Times: user=5.02 sys=0.02, real=5.07 secs]
> 214388.159: [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan (parallel) , 1.5403455 secs]214389.699: [weak refs processing, 0.0050170 secs] [1 CMS-remark: 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs] [Times: user=1.80 sys=0.71, real=1.55 secs]
>
>
> 214389.705: [CMS-concurrent-sweep-start]
> 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: user=1.35 sys=0.00, real=0.73 secs]
> 214390.439: [CMS-concurrent-reset-start]
> 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: user=0.20 sys=0.02, real=0.18 secs]
>
> It seems like a) initial-mark shouldn't take 1.8 seconds, b) if we really
> do only have 12mb of live data, CMS should have collected a lot more than it
> did (the next ParNew collection reported ~545MB of old gen in use), and c)
> 50% heap usage with very little promotion seems very early for the collector
> to go off.
>
> The next CMS cycle is at 434,973 seconds, by which point the young gen
> collections are taking 3 seconds (user 3.59, sys 1.60, real 3.09). The
> initial mark takes 4.82 seconds (user 3.82, sys 0.02, real 4.82), and sweeps
> down to 1.1gb of used old gen. I haven't yet confirmed it, but given the
> previous heap dumps I'd guess that they will claim 12mb of live objects and
> 1.1gb of dead objects. The current young gen collections (at 497,601
> seconds) are taking ~3.7 seconds (4.33 user, 2.03 sys) Any idea what could
> be going on here? We're running JDK 1.6_16.
>
> --
> Eagle Kessler
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100721/4018e577/attachment.html 


More information about the hotspot-gc-use mailing list