Growing GC Young Gen Times

Wed May 12 16:35:17 PDT 2010

Hi Matt,

You're having a 1-2s break every ~30 seconds. It's more efficient to collect
more things in 1 pass, but if you want smaller, more frequent delays, try
lowering the heap sizes. As you never reach more than 8G used memory before
the pauses get too insane, try the following setting:

-Xmx8g
-Xmn1g

This way, the maximum amount of data the ParNew collector has to process is
only 1/4 of the original, so the worst case delay is reduced by 75%. And
unless you're leaking memory somewhere, the old generation heap should
contain only long living objects, like data files that should stay loaded.
The way it looks now is that the whole old generation is filled with
garbage, but as you never reach 60% used, it's never collected.

I'd drop the MaxGCPauseMillis too, the goal is unreachable anyway. You can't
collect a heap of a few GB in 50ms.

Our setup here is pretty similar, the current heap size at least, but our
memory allocation pressure is much higher; plus, with a LRU cache as large
as the java heap allows, we constantly need to promote things from young to
old, and collect the old generation with CMS to free up space for more data.
This is how we tuned it:

Basics: -Xmx32g -Xms32g -Xmn1500m
Voodoo: 4 threads, CMSInitiatingOccupancyFraction 76%, MaxTenuringThreshold
1 (yes, promote after 1 copy max, LRU means if it sticks around for 10s, it
stays), SurvivorRatio=2 (prevents overflow directly into old gen)

We collect the young gen about once every 3 to 5 seconds during peaks,
example line:
2010-05-12T23:05:55.253+0200: 1496100.414: [GC 1496100.414: [ParNew:
1077270K->318127K(1152000K), 0.3551790 secs]
20722984K->20150315K(33170432K), 0.3554220 secs] [Times: user=1.57 sys=0.02,
real=0.36 secs]

So, to scan, promote, about 1GB of data, we use 0.4s. If we used a 4G new
generation, it could be as bad as your logs, yes; 0.4s x4 = 1.6s.

Too bad we can't get the minimum delay any lower; with our allocation rate,
if we reduce the new generation size, there's a good chance that a lot of
temporary data leaks through to the old generation, which is much, much more
expensive to collect. With the current size, it already means that data held
for more than 2x the collect delay, or ~8 seconds, leaks through to the old
gen, even if it isn't supposed to be - only data in the LRU cache should be
there.

With your allocation rate of 4G/30s = 136M/sec, you can play with sizes as
small as 512m and just let some objects with a ~10s+ lifetime leak through
to the old gen - CMS does it work in the background anyway, so if you want
to minimize pauses and have spare CPU cycles, go for a tiny new generation.

Chi Ho

On Thu, May 13, 2010 at 12:19 AM, Matt Fowles <matt.fowles at gmail.com> wrote:

> All~
>
> I have a large app that produces ~4g of garbage every 30 seconds and
> am trying to reduce the size of gc outliers.  About 99% of this data
> is garbage, but almost anything that survives one collection survives
> for an indeterminately long amount of time.  We are currently using
> the following VM and options:
>
> java version "1.6.0_20"
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>
>               -verbose:gc
>               -XX:+PrintGCTimeStamps
>               -XX:+PrintGCDetails
>               -XX:+PrintGCTaskTimeStamps
>               -XX:+PrintTenuringDistribution
>               -XX:+PrintCommandLineFlags
>               -XX:+PrintReferenceGC
>               -Xms32g -Xmx32g -Xmn4g
>               -XX:+UseParNewGC
>               -XX:ParallelGCThreads=4
>               -XX:+UseConcMarkSweepGC
>               -XX:ParallelCMSThreads=4
>               -XX:CMSInitiatingOccupancyFraction=60
>               -XX:+UseCMSInitiatingOccupancyOnly
>               -XX:+CMSParallelRemarkEnabled
>               -XX:MaxGCPauseMillis=50
>               -Xloggc:gc.log
>
>
> As you can see from the GC log, we never actually reach the point
> where the CMS kicks in (after app startup).  But our young gens seem
> to take increasingly long to collect as time goes by.
>
> The steady state of the app is reached around 956.392 into the log
> with a collection that takes 0.106 seconds.  Thereafter the survivor
> space remains roughly constantly as filled and the amount promoted to
> old gen also remains constant, but the collection times increase to
> 2.855 seconds by the end of the 3.5 hour run.
>
> Has anyone seen this sort of behavior before?  Are there more switches
> that I should try running with?
>
> Obviously, I am working to profile the app and reduce the garbage load
> in parallel.  But if I still see this sort of problem, it is only a
> question of how long must the app run before I see unacceptable
> latency spikes.
>
> Matt
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100513/49a02491/attachment.html