CMS vs G1 - Scan RS very long

Thu Jan 31 15:47:31 UTC 2013

Hi Kirk, 

We found the default calculated minimum eden size too large as well. You can shrink the eden to the "optimal" size by specifying the new size parameter (-XX:NewSize=16m). If not specified the minimum eden size is adaptively calculated using the G1DefaultMinNewGenPercent parameter from the overall heap size (mind that some parameters got recently renamed again). If you specify the NewRatio, you won't be able to control min and max eden sizes at all. 

The CMS vs G1 test was done with exactly same eden and survivor setting in order to keep the young collections invoked with the same frequency (CMS/ParNew ~20sec, G1 ~16sec). The G1 was delivering same results for eden size around 12MB but it was invoked 10 times more frequently than ParNew with 128MB. A low eden size results into low survivor space, with not much aging, all gets promoted to the old regions, occupied old regions invokes concurrent marking frequently followed by very long mixed modes - it is simply not very sustainable setup.  

The CMS/ParNew collection is for us currently minimum 5 times faster than the G1 young collection.

iCMS - we are changing into the CMS with the CMSTriggerInterval to keep the old gen collected in regular intervals without waiting for the occupation level being reached. It might be even better than the iCMS as the marking is done without incremental interruptions which can reduce amount of the remarking as there is less time passed between the mark and remark phases (I might be wrong).

Regards,
Michal

Od: "Kirk Pepperdine" kirk at kodewerk.com
Komu: "Michal Frajt" michal at frajt.eu
Kopie: hotspot-gc-dev at openjdk.java.net
Datum: Thu, 31 Jan 2013 15:47:58 +0100
Předmet: Re: CMS vs G1 - Scan RS very long

> Hi,
> 
> I'd like to add to Michal's comment to say that i've to add that I've seen very similar results in recent tuning efforts for low latency. In this case we didn't have a lot of mutation in old gen but I wasn't able to get young gen pauses times down to any where near what I could get to with the CMS collector. Unfortunately I've not been able to characterize the problem as well as you have as we had other fish to fry and I only had a limited amount of time to look at GC. That said I still will be able to run more experiments in the next two weeks.
> 
> What I did notice is that young gen started reducing it's size but where as I calculated that a 15m eden was optimal, it stopped down sizing @ 40m. I'd be interested if anyone has any suggestions on how to get the young gen shrink if it's not shrinking enough on it's own. I'm hesitant to fix the size as there are times when heap should grow but under normal load I would hope that it would return to the smaller size.
> 
> Over all I'd have to say that this is an application where I definitively would have recommended iCMS even though the hardware has 24 cores. It's very disappointing that iCMS has been depreciated even though there are many using it. I did a quick scan of my GC log DB and I'm seeing about 15% of the logs showing an icms_dc tag.
> 
> Regards,
> Kirk
> On 2013-01-31, at 3:12 PM, "Michal Frajt"  wrote:
> 
> > Hi all,
> > 
> > After the iCMS got officially deprecated we decided to compare the G1 collector with our best tuned (i)CMS setup. Unfortunately we are not able to make the G1 young collection running any closer to the ParNew. Actually we wanted to compare the G1 concurrent marking STW pauses with the CMS initial-mark and remark STW pauses but already incredibly long running G1 young collections are unacceptable for us.
> > 
> > We were able to recognize that the very long G1 young collections are caused by the scanning remembered sets. There is not much documentation about G1 internals but we were able to understand that the size of the remembered sets is related to the amount of mutating references from old regions (cards) to young regions. Unfortunately all our applications mutate permanently thousands references from old objects to young objects.
> > 
> > We are testing with the latest OpenJDK7u extended by the 7189971 patch and CMSTriggerInterval implementation. The attached GC log files represent two very equal applications processing very similar data sets, one running the G1, second running the CMS collector. The OpenJDK7u has an extra output of _pending_cards (when G1TraceConcRefinement activated) which somehow relates to the remembered sets size.
> > 
> > Young Comparison (both 128m, survivor ratio 5, max tenuring 15)
> > CMS - invoked every ~20 sec, avg. stop 60ms
> > G1 - invoked every ~16 sec, avg. stop 410ms !!!
> > 
> > It there anything what could help us to reduce the Scan RS time or the G1 is simply not targeted for applications mutating heavily old region objects?
> > 
> > CMS parameters
> > -Xmx8884m -Xms2048m -XX:NewSize=128m -XX:MaxNewSize=128m -XX:PermSize=128m -XX:SurvivorRatio=5 -XX:MaxTenuringThreshold=15 -XX:CMSMarkStackSize=8M -XX:CMSMarkStackSizeMax=32M -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:CMSWaitDuration=60000 -XX:+CMSScavengeBeforeRemark -XX:CMSTriggerInterval=600000 -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:ParallelCMSThreads=2
> > 
> > G1 parameters (mind MaxNewSize not specified)
> > -Xmx8884m -Xms2048m -XX:NewSize=128m -XX:PermSize=128m -XX:SurvivorRatio=5 -XX:MaxTenuringThreshold=15 -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1MixedGCCountTarget=16 -XX:ParallelGCThreads=8 -XX:ConcGCThreads=2
> > 
> > G1 log file GC young pause
> > [GC pause (young) [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 23697, predicted base time: 32.88 ms, remaining
> > time: 167.12 ms, target pause time: 200.00 ms]
> > [Parallel Time: 389.8 ms, GC Workers: 8]
> >>>>> 
> >    [Scan RS (ms): Min: 328.8, Avg: 330.4, Max: 332.6, Diff: 3.8, Sum: 2642.9]
> >    <<<<
> > [Eden: 119.0M(119.0M)->0.0B(118.0M) Survivors: 9216.0K->10.0M Heap: 1801.6M(2048.0M)->1685.7M(2048.0M)]
> > 
> > Regards,
> > Michal
> > 
> > 
> > 
>