Growing GC Young Gen Times

Matt Fowles matt.fowles at gmail.com
Thu Jun 3 14:40:11 PDT 2010


Eric~

That is my suspicion as well.  It would be nice if there were a flag or
something to print stats for JNI handles...

Matt

On Thu, Jun 3, 2010 at 4:57 PM, Eric Caspole <eric.caspole at amd.com> wrote:

> Hi Matt,
> I had a problem like this in a previous job. It turned out to be leaking
> jni handles. The jni handles get scanned during the GC even if they are
> stale/useless. So I would carefully inspect your JNI code for incorrect use
> of jni handles.
>
> I debugged this problem by doing before/after oprofiles and comparing the
> runs to see what was taking more time in gc, if that is any help to you.
> Regards,
> Eric
>
>
>
> On Jun 3, 2010, at 1:17 PM, Matt Fowles wrote:
>
>  All~
>>
>> Today we were able to isolate the piece of our system that is causing the
>> increase in GC times.  We can now reproduce on multiple machines in a much
>> faster manner (about 20 minutes).  The attached log represents a run from
>> just this component.  There is a full GC part way through the run that we
>> forced using visual vm.  The following interesting facts are presented for
>> your amusement/edification:
>>
>> 1) the young gen pause times increase steadily over the course of the run
>> 2) the full GC doesn't effect the young gen pause times
>> 3) the component in question makes heavy use of JNI
>>
>> I suspect the 3rd fact is the most interesting one.  We will [obviously]
>> be looking into reducing the test case further and possibly fixing our JNI
>> code.  But this represents a huge leap forward in our understanding of the
>> problem.
>>
>> Matt
>>
>>
>> On Fri, May 14, 2010 at 3:32 PM, Y. Srinivas Ramakrishna <
>> y.s.ramakrishna at oracle.com> wrote:
>> Matt -- Yes, comparative data for all these for 6u20 and jdk 7
>> would be great. Naturally, server 1 is most immediately useful
>> for determining if 6631166 addresses this at all,
>> but others would be useful too if it turns out it doesn't
>> (i.e. if jdk 7's server 1 turns out to be no better than 6u20's --
>> at which point we should get this into the right channel -- open a bug,
>> and a support case).
>>
>> thanks.
>> -- ramki
>>
>>
>> On 05/14/10 12:24, Matt Fowles wrote:
>> Ramki~
>>
>> I am preparing the flags for the next 3 runs (which run in parallel) and
>> wanted to check a few things with you.  I believe that each of these is
>> collecting a useful data point,
>>
>> Server 1 is running with 8 threads, reduced young gen, and MTT 1.
>> Server 2 is running with 8 threads, reduced young gen, and MTT 1, ParNew,
>> but NOT CMS.
>> Server 3 is running with 8 threads, reduced young gen, and MTT 1, and
>> PrintFLSStatistics.
>>
>> I can (additionally) run all of these tests on JDK7 (Java HotSpot(TM)
>> 64-Bit Server VM (build 17.0-b05, mixed mode)).
>>
>> Server 1:
>>           -verbose:gc
>>           -XX:+PrintGCTimeStamps
>>           -XX:+PrintGCDetails
>>           -XX:+PrintGCTaskTimeStamps
>>           -XX:+PrintCommandLineFlags
>>
>>           -Xms32g -Xmx32g -Xmn1g
>>           -XX:+UseParNewGC
>>           -XX:ParallelGCThreads=8
>>           -XX:+UseConcMarkSweepGC
>>           -XX:ParallelCMSThreads=8
>>           -XX:MaxTenuringThreshold=1
>>           -XX:SurvivorRatio=14
>>           -XX:+CMSParallelRemarkEnabled
>>           -Xloggc:gc1.log
>>           -XX:+UseLargePages             -XX:+AlwaysPreTouch
>>
>> Server 2:
>>           -verbose:gc
>>           -XX:+PrintGCTimeStamps
>>           -XX:+PrintGCDetails
>>           -XX:+PrintGCTaskTimeStamps
>>           -XX:+PrintCommandLineFlags
>>
>>           -Xms32g -Xmx32g -Xmn1g
>>           -XX:+UseParNewGC
>>           -XX:ParallelGCThreads=8
>>           -XX:MaxTenuringThreshold=1
>>           -XX:SurvivorRatio=14
>>           -Xloggc:gc2.log
>>           -XX:+UseLargePages             -XX:+AlwaysPreTouch
>>
>>
>> Server 3:
>>           -verbose:gc
>>           -XX:+PrintGCTimeStamps
>>           -XX:+PrintGCDetails
>>           -XX:+PrintGCTaskTimeStamps
>>           -XX:+PrintCommandLineFlags
>>
>>           -Xms32g -Xmx32g -Xmn1g
>>           -XX:+UseParNewGC
>>           -XX:ParallelGCThreads=8
>>           -XX:+UseConcMarkSweepGC
>>           -XX:ParallelCMSThreads=8
>>           -XX:MaxTenuringThreshold=1
>>           -XX:SurvivorRatio=14
>>           -XX:+CMSParallelRemarkEnabled
>>           -Xloggc:gc3.log
>>
>>           -XX:PrintFLSStatistics=2
>>           -XX:+UseLargePages
>>           -XX:+AlwaysPreTouch
>>  Matt
>>
>> On Fri, May 14, 2010 at 1:44 PM, Y. Srinivas Ramakrishna <
>> y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>> wrote:
>>  > On 05/14/10 10:36, Y. Srinivas Ramakrishna wrote:
>>  >>
>>  >> On 05/14/10 10:24, Matt Fowles wrote:
>>  >>>
>>  >>> Jon~
>>  >>>
>>  >>> That makes, sense but the fact is that the old gen *never* get
>>  >>> collected.  So all the allocations happen from the giant empty space
>>  >>> at the end of the free list.  I thought fragmentation only occurred
>>  >>> when the free lists are added to after freeing memory...
>>  >>
>>  >> As Jon indicated allocation is done from free lists of blocks
>>  >> that are pre-carved on demand to avoid contention while allocating.
>>  >> The old heuristics for how large to make those lists and the
>>  >> inventory to hold in those lists was not working well as you
>>  >> scaled the number of workers. Following 6631166 we believe it
>>  >> works better and causes both less contention and less
>>  >> fragmentation than it did before, because we do not hold
>>  >> unnecessary excess inventory of free blocks.
>>  >
>>  > To see what the fragmentation is, try -XX:PrintFLSStatistics=2.
>>  > This will slow down your scavenge pauses (perhaps by quite a bit
>>  > for your 26 GB heap), but you will get a report of the number of
>>  > blocks on free lists and how fragmented the space is on that ccount
>>  > (for some appropriate notion of fragmentation). Don't use that
>>  > flag in production though :-)
>>  >
>>  > -- ramki
>>  >
>>  >>
>>  >> The fragmentation in turn causes card-scanning to suffer
>>  >> adversely, besides the issues with loss of spatial locality also
>>  >> increasing cache misses and TLB misses. (The large page
>>  >> option might help mitigate the latter a bit, especially
>>  >> since you have such a large heap and our fragmented
>>  >> allocation may be exacerbating the TLB pressure.)
>>  >>
>>  >> -- ramki
>>  >>
>>  >>> Matt
>>  >>>
>>  >>> On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu <
>> jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>>
>>
>>  >>> wrote:
>>  >>>>
>>  >>>> Matt,
>>  >>>>
>>  >>>> To amplify on Ramki's comment, the allocations out of the
>>  >>>> old generation are always from a free list.  During a young
>>  >>>> generation collection each GC thread will get its own
>>  >>>> local free lists from the old generation so that it can
>>  >>>> copy objects to the old generation without synchronizing
>>  >>>> with the other GC thread (most of the time).  Objects from
>>  >>>> a GC thread's local free lists are pushed to the globals lists
>>  >>>> after the collection (as far as I recall). So there is some
>>  >>>> churn in the free lists.
>>  >>>>
>>  >>>> Jon
>>  >>>>
>>  >>>> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote:
>>  >>>>>
>>  >>>>> On 05/13/10 10:50, Matt Fowles wrote:
>>  >>>>>>
>>  >>>>>> Jon~
>>  >>>>>>
>>  >>>>>> This may sound naive, but how can fragmentation be an issue if the
>> old
>>  >>>>>> gen has never been collected?  I would think we are still in the
>> space
>>  >>>>>> where we can just bump the old gen alloc pointer...
>>  >>>>>
>>  >>>>> Matt, The old gen allocator may fragment the space. Allocation is
>> not
>>  >>>>> exactly "bump a pointer".
>>  >>>>>
>>  >>>>> -- ramki
>>  >>>>>
>>  >>>>>> Matt
>>  >>>>>>
>>  >>>>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu
>>  >>>>>> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>>
>> wrote:
>>  >>>>>>>
>>  >>>>>>> Matt,
>>  >>>>>>>
>>  >>>>>>> As Ramki indicated fragmentation might be an issue.  As the
>>  >>>>>>> fragmentation
>>  >>>>>>> in the old generation increases, it takes longer to find space in
>> the
>>  >>>>>>> old
>>  >>>>>>> generation
>>  >>>>>>> into which to promote objects from the young generation.  This is
>>  >>>>>>> apparently
>>  >>>>>>> not
>>  >>>>>>> the problem that Wayne is having but you still might be hitting
>> it.
>>  >>>>>>>  If
>>  >>>>>>> you
>>  >>>>>>> can
>>  >>>>>>> connect jconsole to the VM and force a full GC, that would tell
>> us if
>>  >>>>>>> it's
>>  >>>>>>> fragmentation.
>>  >>>>>>>
>>  >>>>>>> There might be a scaling issue with the UseParNewGC.  If you can
>> use
>>  >>>>>>> -XX:-UseParNewGC (turning off the parallel young
>>  >>>>>>> generation collection) with  -XX:+UseConcMarkSweepGC the pauses
>>  >>>>>>> will be longer but may be more stable.  That's not the solution
>> but
>>  >>>>>>> just
>>  >>>>>>> part
>>  >>>>>>> of the investigation.
>>  >>>>>>>
>>  >>>>>>> You could try just -XX:+UseParNewGC without
>> -XX:+UseConcMarkSweepGC
>>  >>>>>>> and if you don't see the growing young generation pause, that
>> would
>>  >>>>>>> indicate
>>  >>>>>>> something specific about promotion into the CMS generation.
>>  >>>>>>>
>>  >>>>>>> UseParallelGC is different from UseParNewGC in a number of ways
>>  >>>>>>> and if you try UseParallelGC and still see the growing young
>>  >>>>>>> generation
>>  >>>>>>> pauses, I'd suspect something special about your application.
>>  >>>>>>>
>>  >>>>>>> If you can run these experiments hopefully they will tell
>>  >>>>>>> us where to look next.
>>  >>>>>>>
>>  >>>>>>> Jon
>>  >>>>>>>
>>  >>>>>>>
>>  >>>>>>> On 05/12/10 15:19, Matt Fowles wrote:
>>  >>>>>>>
>>  >>>>>>> All~
>>  >>>>>>>
>>  >>>>>>> I have a large app that produces ~4g of garbage every 30 seconds
>> and
>>  >>>>>>> am trying to reduce the size of gc outliers.  About 99% of this
>> data
>>  >>>>>>> is garbage, but almost anything that survives one collection
>> survives
>>  >>>>>>> for an indeterminately long amount of time.  We are currently
>> using
>>  >>>>>>> the following VM and options:
>>  >>>>>>>
>>  >>>>>>> java version "1.6.0_20"
>>  >>>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>>  >>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>>  >>>>>>>
>>  >>>>>>>              -verbose:gc
>>  >>>>>>>              -XX:+PrintGCTimeStamps
>>  >>>>>>>              -XX:+PrintGCDetails
>>  >>>>>>>              -XX:+PrintGCTaskTimeStamps
>>  >>>>>>>              -XX:+PrintTenuringDistribution
>>  >>>>>>>              -XX:+PrintCommandLineFlags
>>  >>>>>>>              -XX:+PrintReferenceGC
>>  >>>>>>>              -Xms32g -Xmx32g -Xmn4g
>>  >>>>>>>              -XX:+UseParNewGC
>>  >>>>>>>              -XX:ParallelGCThreads=4
>>  >>>>>>>              -XX:+UseConcMarkSweepGC
>>  >>>>>>>              -XX:ParallelCMSThreads=4
>>  >>>>>>>              -XX:CMSInitiatingOccupancyFraction=60
>>  >>>>>>>              -XX:+UseCMSInitiatingOccupancyOnly
>>  >>>>>>>              -XX:+CMSParallelRemarkEnabled
>>  >>>>>>>              -XX:MaxGCPauseMillis=50
>>  >>>>>>>              -Xloggc:gc.log
>>  >>>>>>>
>>  >>>>>>>
>>  >>>>>>> As you can see from the GC log, we never actually reach the point
>>  >>>>>>> where the CMS kicks in (after app startup).  But our young gens
>> seem
>>  >>>>>>> to take increasingly long to collect as time goes by.
>>  >>>>>>>
>>  >>>>>>> The steady state of the app is reached around 956.392 into the
>> log
>>  >>>>>>> with a collection that takes 0.106 seconds.  Thereafter the
>> survivor
>>  >>>>>>> space remains roughly constantly as filled and the amount
>> promoted to
>>  >>>>>>> old gen also remains constant, but the collection times increase
>> to
>>  >>>>>>> 2.855 seconds by the end of the 3.5 hour run.
>>  >>>>>>>
>>  >>>>>>> Has anyone seen this sort of behavior before?  Are there more
>>  >>>>>>> switches
>>  >>>>>>> that I should try running with?
>>  >>>>>>>
>>  >>>>>>> Obviously, I am working to profile the app and reduce the garbage
>>  >>>>>>> load
>>  >>>>>>> in parallel.  But if I still see this sort of problem, it is only
>> a
>>  >>>>>>> question of how long must the app run before I see unacceptable
>>  >>>>>>> latency spikes.
>>  >>>>>>>
>>  >>>>>>> Matt
>>  >>>>>>>
>>  >>>>>>> ________________________________
>>  >>>>>>> _______________________________________________
>>  >>>>>>> hotspot-gc-use mailing list
>>  >>>>>>> hotspot-gc-use at openjdk.java.net <mailto:
>> hotspot-gc-use at openjdk.java.net>
>>
>>  >>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>  >>>>>>
>>  >>>>>> _______________________________________________
>>  >>>>>> hotspot-gc-use mailing list
>>  >>>>>> hotspot-gc-use at openjdk.java.net <mailto:
>> hotspot-gc-use at openjdk.java.net>
>>
>>  >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>  >>
>>  >> _______________________________________________
>>  >> hotspot-gc-use mailing list
>>  >> hotspot-gc-use at openjdk.java.net <mailto:
>> hotspot-gc-use at openjdk.java.net>
>>
>>  >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>  >
>>  >
>>
>>
>>
>> <gc.log.gz><ATT00001..txt>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100603/2cdc3a9f/attachment-0001.html 


More information about the hotspot-gc-use mailing list