Unreasonably long ParNew, CMS pause times

Wed Jul 21 12:34:46 PDT 2010

Yes, like what i think Matt is getting at, i'd configure sufficiently
large survivor spaces. Even if you expect most of your objects to die young,
you'd want survivor spaces large enough to keep at least age 1 objects in
the survivor space. If as you state no medium- to ling-lived state is
retained, your data is mostly short-lived and you'll be able to do without
any promotion at all. Your problem here is that somehow your survivor
spaces may have disappeared. (+PrintHeapAtGC will tell you, and
of course examining yr JVM options should shed more light on that apparent
disappearance.)

-- ramki

On 07/21/10 10:09, Eagle Kessler wrote:
> Hello all,
>   I've got a web service that I'm seeing some very strange behavior on 
> around it's garbage collection, and was wondering if someone here could 
> explain why it might be happening. The service itself is fairly simple: 
> Take in data from a few sources and merge them with the existing data in 
> the database. It stores nearly no state while doing this, and indeed 
> heap dumps taken 1, 24, and 72 hours after boot indicate that we have a 
> consistent ~12mb of live data (in a 2GB heap, but I hope that's not what 
> is causing this).
> 
> The GC logs, though, don't look very happy at all. After our start up 
> period, they settle into a fairly consistent pattern:
> 
> 1041.159: [GC 1041.159: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> 
> : 516224K->0K(520256K), 0.0170322 secs] 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04 sys=0.01, real=0.02 secs] 
> 1606.500: [GC 1606.500: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> 
> : 516224K->0K(520256K), 0.0173235 secs] 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04 sys=0.03, real=0.02 secs] 
> 2040.773: [GC 2040.773: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> 
> : 516224K->0K(520256K), 0.0196319 secs] 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05 sys=0.02, real=0.02 secs] 
> 
> Which we would be very nice if it kept going like that. However, by the 
> first time the CMS collector runs, things aren't working nearly as well:
> 
> 214182.439: [GC 214182.439: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> : 516224K->0K(520256K), 1.0146996 secs] 1297278K->782575K(2093120K), 1.0148799 secs] [Times: user=1.21 sys=0.58, real=1.01 secs] 
> 
> 214247.437: [GC 214247.438: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> : 516224K->0K(520256K), 1.2310274 secs] 1298799K->784188K(2093120K), 1.2311534 secs] [Times: user=1.46 sys=0.69, real=1.23 secs] 
> 
> 214313.642: [GC 214313.642: [ParNew
> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
> : 516224K->0K(520256K), 1.2183258 secs] 1300412K->785710K(2093120K), 1.2184848 secs] [Times: user=1.45 sys=0.65, real=1.22 secs]
> 
> The increasing sys time is a bit worrying, but it seems like the actual 
> GC time is rising as well, even though we aren't collecting any more 
> young-gen garbage. At this point, CMS went off
> 
> 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 sys=0.02, real=1.89 secs] 
> 
> 214382.589: [CMS-concurrent-mark-start]
> 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: user=1.81 sys=0.01, real=0.47 secs] 
> 214383.056: [CMS-concurrent-preclean-start]
> 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
> 
> 214383.064: [CMS-concurrent-abortable-preclean-start]
>  CMS: abort preclean due to time 214388.133: [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] [Times: user=5.02 sys=0.02, real=5.07 secs] 
> 214388.159: [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan (parallel) , 1.5403455 secs]214389.699: [weak refs processing, 0.0050170 secs] [1 CMS-remark: 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs] [Times: user=1.80 sys=0.71, real=1.55 secs] 
> 
> 214389.705: [CMS-concurrent-sweep-start]
> 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: user=1.35 sys=0.00, real=0.73 secs] 
> 214390.439: [CMS-concurrent-reset-start]
> 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: user=0.20 sys=0.02, real=0.18 secs]
> 
> It seems like a) initial-mark shouldn't take 1.8 seconds, b) if we 
> really do only have 12mb of live data, CMS should have collected a lot 
> more than it did (the next ParNew collection reported ~545MB of old gen 
> in use), and c) 50% heap usage with very little promotion seems very 
> early for the collector to go off.
> 
> The next CMS cycle is at 434,973 seconds, by which point the young gen 
> collections are taking 3 seconds (user 3.59, sys 1.60, real 3.09). The 
> initial mark takes 4.82 seconds (user 3.82, sys 0.02, real 4.82), and 
> sweeps down to 1.1gb of used old gen. I haven't yet confirmed it, but 
> given the previous heap dumps I'd guess that they will claim 12mb of 
> live objects and 1.1gb of dead objects. The current young gen 
> collections (at 497,601 seconds) are taking ~3.7 seconds (4.33 user, 
> 2.03 sys) Any idea what could be going on here? We're running JDK 1.6_16.
> 
> -- 
> Eagle Kessler
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use