Does allocation performance vary by collector?
Tony Printezis
tony.printezis at oracle.com
Wed Apr 14 14:54:07 UTC 2010
Matt,
We've been normalizing the G1 parameters (so that they conform to the
existing ones as much as possible). So, the latest G1 observes -Xmn,
NewSize, NewRatio, etc.
FWIW.
Tony
Bollier, Matt wrote:
> My understanding is that -Xmn is not respected when using the G1 collector. Try using -XX:G1YoungGenSize=2944m instead.
>
> -----Original Message-----
> From: hotspot-gc-dev-bounces at openjdk.java.net [mailto:hotspot-gc-dev-bounces at openjdk.java.net] On Behalf Of Matt Khan
> Sent: Tuesday, April 13, 2010 12:46 PM
> To: hotspot-gc-dev at openjdk.java.net
> Subject: Does allocation performance vary by collector?
>
> Hi
>
> I have been revisiting our jvm configuration with the aim of reducing
> pause times, it would be nice to be consistently down below 3ms all the
> time. The allocation behaviour of the application in question involves a
> small amount of static data on startup & then a steady stream of objects
> that have a relatively short lifespan. There are 2 typical lifetimes of
> these objects with about 75% while the remainder have a mean of maybe 70s
> but there is a quite a long tail to this so the typical lifetime is more
> like <10s. There won't be many such objects alive at once but there are
> quite a few passing through. The app runs on a 16 core opteron box running
> Solaris 10 with 6u18.
>
> Therefore I've been benching different configurations with a massive eden
> and relatively tiny tenured & trying different collectors to see how they
> perform. These params were common to each run
>
> -Xms3072m
> -Xmx3072m
> -Xmn2944m
> -XX:+DisableExplicitGC
> -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCApplicationConcurrentTime
> -XX:MaxTenuringThreshold=1
> -XX:SurvivorRatio=190
> -XX:TargetSurvivorRatio=90
>
> I then tried the following
>
> # Parallel Scavenge
> -XX:+UseParallelGC
> -XX:+UseParallelOldGC
>
> # Parallel Scavenge with NUMA
> -XX:+UseParallelGC
> -XX:+UseNUMA
> -XX:+UseParallelOldGC
>
> # Incremental CMS/ParNew
> -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode
> -XX:+CMSIncrementalPacing
> -XX:+UseParNewGC
>
> # G1
> -XX:+UnlockExperimentalVMOptions
> -XX:+UseG1GC
>
> The last two (CMS/G1) were repeated on 6u18 & 6u20b02 for completeness as
> I see there were assorted fixes to G1 in 6u20b01.
>
> I measure the time it takes to execute assorted points in my flow & see
> fairly significant differences in latencies with each collector, for
> example
>
> 1) CMS == ~380-400micros
> 2) Parallel + NUMA == ~400micros
> 3) Parallel == ~450micros
> 4) G1 == ~550micros
>
> The times above are taken well after the jvm has warmed up (latencies have
> stabilised, compilation activity is practically non-existent) & there is
> no significant "other" activity on the server at the time. The differences
> don't appear to be pause related as the shape of the distribution (around
> those averages) is the same, it's as if it has settled into quite a
> different steady state performance. This appears to be repeatable though,
> given the time it takes to run this sort of benchmark, I admit to only
> have seen it repeated a few times. I have run previous benchmarks where it
> repeats it 20x times (keeping GC constant in this case, was testing
> something else) without seeing variations that big across runs which makes
> me suspect the collection algorithm as the culprit.
>
> So the point of this relatively long setup is to ask whether there are
> theoretical reasons why the choice of garbage collection algorithm should
> vary measured latency like this? I had been working on the assumption that
> eden allocation is a "bump the pointer as you take it from a TLAB" type of
> event hence generally cheap & doesn't really vary by algorithm.
>
> fwiw the ParNew/CMS config is still the best one for keeping down pause
> times though the parallel one was close. The former peaks at intermittent
> pauses of 20-30ms, the latter at about 40ms. The Parallel + NUMA one
> curiously involved many fewer pauses such that much less time was spent
> paused but peaked higher (~120ms) which are unacceptable really. I don't
> really understand why that is but speculated that it's down to the fact
> that one of our key domain objects is allocated in a different thread to
> where it is primarily used. Is this right?
>
> If there is some other data that I should post to back up some of the
> above then pls tell me and I'll add the info if I have it (and repeat the
> test if I don't)
>
> Cheers
> Matt
>
> Matt Khan
> --------------------------------------------------
> GFFX Auto Trading
> Deutsche Bank, London
>
>
> ---
>
> This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
>
> Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.
>
More information about the hotspot-gc-dev
mailing list