Unreasonably long ParNew, CMS pause times

Wed Jul 21 12:40:28 PDT 2010

oh, and the "(max 0)" seems to imply you have somehow ended up asking for
MaxTenuringThreshold=0 which can be disastrous. It is possible that the
default settings for CMS on certain very old JVM's did indeed result in
MTT=0 (but memory is fading of that era), so if using such older vintage
of JVM's, explicitly setting MTT and SurvivorRatio higher is a good idea.

-- ramki

On 07/21/10 12:34, Y. S. Ramakrishna wrote:
> Yes, like what i think Matt is getting at, i'd configure sufficiently
> large survivor spaces. Even if you expect most of your objects to die young,
> you'd want survivor spaces large enough to keep at least age 1 objects in
> the survivor space. If as you state no medium- to ling-lived state is
> retained, your data is mostly short-lived and you'll be able to do without
> any promotion at all. Your problem here is that somehow your survivor
> spaces may have disappeared. (+PrintHeapAtGC will tell you, and
> of course examining yr JVM options should shed more light on that apparent
> disappearance.)
> 
> -- ramki
> 
> On 07/21/10 10:09, Eagle Kessler wrote:
>> Hello all,
>>   I've got a web service that I'm seeing some very strange behavior on 
>> around it's garbage collection, and was wondering if someone here could 
>> explain why it might be happening. The service itself is fairly simple: 
>> Take in data from a few sources and merge them with the existing data in 
>> the database. It stores nearly no state while doing this, and indeed 
>> heap dumps taken 1, 24, and 72 hours after boot indicate that we have a 
>> consistent ~12mb of live data (in a 2GB heap, but I hope that's not what 
>> is causing this).
>>
>> The GC logs, though, don't look very happy at all. After our start up 
>> period, they settle into a fairly consistent pattern:
>>
>> 1041.159: [GC 1041.159: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>>
>> : 516224K->0K(520256K), 0.0170322 secs] 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04 sys=0.01, real=0.02 secs] 
>> 1606.500: [GC 1606.500: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>>
>> : 516224K->0K(520256K), 0.0173235 secs] 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04 sys=0.03, real=0.02 secs] 
>> 2040.773: [GC 2040.773: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>>
>> : 516224K->0K(520256K), 0.0196319 secs] 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05 sys=0.02, real=0.02 secs] 
>>
>> Which we would be very nice if it kept going like that. However, by the 
>> first time the CMS collector runs, things aren't working nearly as well:
>>
>> 214182.439: [GC 214182.439: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>> : 516224K->0K(520256K), 1.0146996 secs] 1297278K->782575K(2093120K), 1.0148799 secs] [Times: user=1.21 sys=0.58, real=1.01 secs] 
>>
>> 214247.437: [GC 214247.438: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>> : 516224K->0K(520256K), 1.2310274 secs] 1298799K->784188K(2093120K), 1.2311534 secs] [Times: user=1.46 sys=0.69, real=1.23 secs] 
>>
>> 214313.642: [GC 214313.642: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>> : 516224K->0K(520256K), 1.2183258 secs] 1300412K->785710K(2093120K), 1.2184848 secs] [Times: user=1.45 sys=0.65, real=1.22 secs]
>>
>> The increasing sys time is a bit worrying, but it seems like the actual 
>> GC time is rising as well, even though we aren't collecting any more 
>> young-gen garbage. At this point, CMS went off
>>
>> 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 sys=0.02, real=1.89 secs] 
>>
>> 214382.589: [CMS-concurrent-mark-start]
>> 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: user=1.81 sys=0.01, real=0.47 secs] 
>> 214383.056: [CMS-concurrent-preclean-start]
>> 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
>>
>> 214383.064: [CMS-concurrent-abortable-preclean-start]
>>  CMS: abort preclean due to time 214388.133: [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] [Times: user=5.02 sys=0.02, real=5.07 secs] 
>> 214388.159: [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan (parallel) , 1.5403455 secs]214389.699: [weak refs processing, 0.0050170 secs] [1 CMS-remark: 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs] [Times: user=1.80 sys=0.71, real=1.55 secs] 
>>
>> 214389.705: [CMS-concurrent-sweep-start]
>> 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: user=1.35 sys=0.00, real=0.73 secs] 
>> 214390.439: [CMS-concurrent-reset-start]
>> 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: user=0.20 sys=0.02, real=0.18 secs]
>>
>> It seems like a) initial-mark shouldn't take 1.8 seconds, b) if we 
>> really do only have 12mb of live data, CMS should have collected a lot 
>> more than it did (the next ParNew collection reported ~545MB of old gen 
>> in use), and c) 50% heap usage with very little promotion seems very 
>> early for the collector to go off.
>>
>> The next CMS cycle is at 434,973 seconds, by which point the young gen 
>> collections are taking 3 seconds (user 3.59, sys 1.60, real 3.09). The 
>> initial mark takes 4.82 seconds (user 3.82, sys 0.02, real 4.82), and 
>> sweeps down to 1.1gb of used old gen. I haven't yet confirmed it, but 
>> given the previous heap dumps I'd guess that they will claim 12mb of 
>> live objects and 1.1gb of dead objects. The current young gen 
>> collections (at 497,601 seconds) are taking ~3.7 seconds (4.33 user, 
>> 2.03 sys) Any idea what could be going on here? We're running JDK 1.6_16.
>>
>> -- 
>> Eagle Kessler
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use