Growing GC Young Gen Times

Thu May 13 16:54:34 PDT 2010

Hi,

I have had a similar experience and I ran into a reasonable and
unsatisfactory solution...

In my case, the young GC times keeps on taking increasingly longer and
longer. Doing GC logging I saw something similar to what you saw - the
ParNew keeps on growing and the amount of data being tenured was huge.
  Having a 800ms GC pause every 4 seconds was no good for me.  I
eventually added this to my java start:

"-XX:NewSize=64m -XX:MaxNewSize=64m"

A friend suggested this to me - he said that a young GC should be fast
because the young gen should be ~ the size of L3 cache.

With this setting I see YoungGCs between .5-3 times a second and they
last between 10-80ms or so.  The CMS will prune massive amounts of
garbage out when it runs, up to 2GB ram in my 6GB heap processes.

Now for a little theorycrafting... The root cause here is my
application breaks the Object Generational Hypothesis.  The GC
auto-tuning will grow the ParNew to reduce the amount of data it is
tenuring, but it is never really able to reach a good steady state.
At this point you are now tenuring 1-2GB of ram.  Tenuring = copying
objects = time consuming.

Once you are at this spot, you find out that every current shipping GC
is just not good enough.  My hope was to use G1, but considering how
unstable it was for me (I have tried 12+ releases of Java7, a few
releases of Java6) I am now shifting my approach.

In my application one of the primary causes of allocation is a block
cache for a database-type application. I am planning on testing a
change where the block cache is maintained in massive
DirectByteBuffers (think sizes from 2-15GB of RAM) and I will manage
all the allocation by hand (in Java).  If you have some ability to
shift memory usage out of the domain of the GC I would highly suggest
doing so.

At this point I can honestly say if you are not Object Generational
Hypothesis Compliant (OGH (tm)) then Java for large heaps can be very
very painful.  I think the choices are DirectByteBuffer, JNI, and not
using Java.  I'd like an option to that, but I'm not sure what it
might be while avoiding that last option (also avoiding JNI too
ideally).

I feel this is the greatest weakness of Java - the memory management
is 1 size fits all, and there are few great options.
DirectByteBuffers have a limited interface and require copying data in
and out to talk to the rest of Java.  JNI has the same issue and has
had a high invokation cost.

Good luck out there, and stay OGH compliant!
-ryan

On Thu, May 13, 2010 at 3:29 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
> Matt,
>
> To amplify on Ramki's comment, the allocations out of the
> old generation are always from a free list.  During a young
> generation collection each GC thread will get its own
> local free lists from the old generation so that it can
> copy objects to the old generation without synchronizing
> with the other GC thread (most of the time).  Objects from
> a GC thread's local free lists are pushed to the globals lists
> after the collection (as far as I recall). So there is some
> churn in the free lists.
>
> Jon
>
> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote:
>> On 05/13/10 10:50, Matt Fowles wrote:
>>> Jon~
>>>
>>> This may sound naive, but how can fragmentation be an issue if the old
>>> gen has never been collected?  I would think we are still in the space
>>> where we can just bump the old gen alloc pointer...
>>
>> Matt, The old gen allocator may fragment the space. Allocation is not
>> exactly "bump a pointer".
>>
>> -- ramki
>>
>>>
>>> Matt
>>>
>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu
>>> <jon.masamitsu at oracle.com> wrote:
>>>> Matt,
>>>>
>>>> As Ramki indicated fragmentation might be an issue.  As the
>>>> fragmentation
>>>> in the old generation increases, it takes longer to find space in
>>>> the old
>>>> generation
>>>> into which to promote objects from the young generation.  This is
>>>> apparently
>>>> not
>>>> the problem that Wayne is having but you still might be hitting it.
>>>> If you
>>>> can
>>>> connect jconsole to the VM and force a full GC, that would tell us
>>>> if it's
>>>> fragmentation.
>>>>
>>>> There might be a scaling issue with the UseParNewGC.  If you can use
>>>> -XX:-UseParNewGC (turning off the parallel young
>>>> generation collection) with  -XX:+UseConcMarkSweepGC the pauses
>>>> will be longer but may be more stable.  That's not the solution but
>>>> just
>>>> part
>>>> of the investigation.
>>>>
>>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC
>>>> and if you don't see the growing young generation pause, that would
>>>> indicate
>>>> something specific about promotion into the CMS generation.
>>>>
>>>> UseParallelGC is different from UseParNewGC in a number of ways
>>>> and if you try UseParallelGC and still see the growing young generation
>>>> pauses, I'd suspect something special about your application.
>>>>
>>>> If you can run these experiments hopefully they will tell
>>>> us where to look next.
>>>>
>>>> Jon
>>>>
>>>>
>>>> On 05/12/10 15:19, Matt Fowles wrote:
>>>>
>>>> All~
>>>>
>>>> I have a large app that produces ~4g of garbage every 30 seconds and
>>>> am trying to reduce the size of gc outliers.  About 99% of this data
>>>> is garbage, but almost anything that survives one collection survives
>>>> for an indeterminately long amount of time.  We are currently using
>>>> the following VM and options:
>>>>
>>>> java version "1.6.0_20"
>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>>>>
>>>>                -verbose:gc
>>>>                -XX:+PrintGCTimeStamps
>>>>                -XX:+PrintGCDetails
>>>>                -XX:+PrintGCTaskTimeStamps
>>>>                -XX:+PrintTenuringDistribution
>>>>                -XX:+PrintCommandLineFlags
>>>>                -XX:+PrintReferenceGC
>>>>                -Xms32g -Xmx32g -Xmn4g
>>>>                -XX:+UseParNewGC
>>>>                -XX:ParallelGCThreads=4
>>>>                -XX:+UseConcMarkSweepGC
>>>>                -XX:ParallelCMSThreads=4
>>>>                -XX:CMSInitiatingOccupancyFraction=60
>>>>                -XX:+UseCMSInitiatingOccupancyOnly
>>>>                -XX:+CMSParallelRemarkEnabled
>>>>                -XX:MaxGCPauseMillis=50
>>>>                -Xloggc:gc.log
>>>>
>>>>
>>>> As you can see from the GC log, we never actually reach the point
>>>> where the CMS kicks in (after app startup).  But our young gens seem
>>>> to take increasingly long to collect as time goes by.
>>>>
>>>> The steady state of the app is reached around 956.392 into the log
>>>> with a collection that takes 0.106 seconds.  Thereafter the survivor
>>>> space remains roughly constantly as filled and the amount promoted to
>>>> old gen also remains constant, but the collection times increase to
>>>> 2.855 seconds by the end of the 3.5 hour run.
>>>>
>>>> Has anyone seen this sort of behavior before?  Are there more switches
>>>> that I should try running with?
>>>>
>>>> Obviously, I am working to profile the app and reduce the garbage load
>>>> in parallel.  But if I still see this sort of problem, it is only a
>>>> question of how long must the app run before I see unacceptable
>>>> latency spikes.
>>>>
>>>> Matt
>>>>
>>>> ________________________________
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>