Does allocation performance vary by collector?

Wed Apr 14 14:54:07 UTC 2010

Matt,

We've been normalizing the G1 parameters (so that they conform to the 
existing ones as much as possible). So, the latest G1 observes -Xmn, 
NewSize, NewRatio, etc.

FWIW.

Tony

Bollier, Matt wrote:
> My understanding is that -Xmn is not respected when using the G1 collector.  Try using -XX:G1YoungGenSize=2944m instead.
>
> -----Original Message-----
> From: hotspot-gc-dev-bounces at openjdk.java.net [mailto:hotspot-gc-dev-bounces at openjdk.java.net] On Behalf Of Matt Khan
> Sent: Tuesday, April 13, 2010 12:46 PM
> To: hotspot-gc-dev at openjdk.java.net
> Subject: Does allocation performance vary by collector?
>
> Hi
>
> I have been revisiting our jvm configuration with the aim of reducing 
> pause times, it would be nice to be consistently down below 3ms all the 
> time. The allocation behaviour of the application in question involves a 
> small amount of static data on startup & then a steady stream of objects 
> that have a relatively short lifespan. There are 2 typical lifetimes of 
> these objects with about 75% while the remainder have a mean of maybe 70s 
> but there is a quite a long tail to this so the typical lifetime is more 
> like <10s. There won't be many such objects alive at once but there are 
> quite a few passing through. The app runs on a 16 core opteron box running 
> Solaris 10 with 6u18.
>
> Therefore I've been benching different configurations with a massive eden 
> and relatively tiny tenured & trying different collectors to see how they 
> perform. These params were common to each run
>
> -Xms3072m 
> -Xmx3072m 
> -Xmn2944m 
> -XX:+DisableExplicitGC 
> -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps 
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCApplicationConcurrentTime
> -XX:MaxTenuringThreshold=1 
> -XX:SurvivorRatio=190 
> -XX:TargetSurvivorRatio=90
>
> I then tried the following
>
> # Parallel Scavenge 
> -XX:+UseParallelGC 
> -XX:+UseParallelOldGC 
>
> # Parallel Scavenge with NUMA
> -XX:+UseParallelGC 
> -XX:+UseNUMA 
> -XX:+UseParallelOldGC 
>
> # Incremental CMS/ParNew
> -XX:+UseConcMarkSweepGC 
> -XX:+CMSIncrementalMode 
> -XX:+CMSIncrementalPacing 
> -XX:+UseParNewGC 
>
> # G1
> -XX:+UnlockExperimentalVMOptions 
> -XX:+UseG1GC 
>
> The last two (CMS/G1) were repeated on 6u18 & 6u20b02 for completeness as 
> I see there were assorted fixes to G1 in 6u20b01.
>
> I measure the time it takes to execute assorted points in my flow & see 
> fairly significant differences in latencies with each collector, for 
> example
>
> 1) CMS == ~380-400micros 
> 2) Parallel + NUMA == ~400micros
> 3) Parallel == ~450micros
> 4) G1 == ~550micros
>
> The times above are taken well after the jvm has warmed up (latencies have 
> stabilised, compilation activity is practically non-existent) & there is 
> no significant "other" activity on the server at the time. The differences 
> don't appear to be pause related as the shape of the distribution (around 
> those averages) is the same, it's as if it has settled into quite a 
> different steady state performance. This appears to be repeatable though, 
> given the time it takes to run this sort of benchmark, I admit to only 
> have seen it repeated a few times. I have run previous benchmarks where it 
> repeats it 20x times (keeping GC constant in this case, was testing 
> something else) without seeing variations that big across runs which makes 
> me suspect the collection algorithm as the culprit.
>
> So the point of this relatively long setup is to ask whether there are 
> theoretical reasons why the choice of garbage collection algorithm should 
> vary measured latency like this? I had been working on the assumption that 
> eden allocation is a "bump the pointer as you take it from a TLAB" type of 
> event hence generally cheap & doesn't really vary by algorithm.
>
> fwiw the ParNew/CMS config is still the best one for keeping down pause 
> times though the parallel one was close. The former peaks at intermittent 
> pauses of 20-30ms, the latter at about 40ms. The Parallel + NUMA one 
> curiously involved many fewer pauses such that much less time was spent 
> paused but peaked higher (~120ms) which are unacceptable really. I don't 
> really understand why that is but speculated that it's down to the fact 
> that one of our key domain objects is allocated in a different thread to 
> where it is primarily used. Is this right?
>
> If there is some other data that I should post to back up some of the 
> above then pls tell me and I'll add the info if I have it (and repeat the 
> test if I don't) 
>
> Cheers
> Matt
>
> Matt Khan
> --------------------------------------------------
> GFFX Auto Trading
> Deutsche Bank, London
>
>
> ---
>
> This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
>
> Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.
>