Growing GC Young Gen Times

Eric Caspole eric.caspole at amd.com
Thu Jun 3 20:57:07 UTC 2010


Hi Matt,
I had a problem like this in a previous job. It turned out to be  
leaking jni handles. The jni handles get scanned during the GC even  
if they are stale/useless. So I would carefully inspect your JNI code  
for incorrect use of jni handles.

I debugged this problem by doing before/after oprofiles and comparing  
the runs to see what was taking more time in gc, if that is any help  
to you.
Regards,
Eric


On Jun 3, 2010, at 1:17 PM, Matt Fowles wrote:

> All~
>
> Today we were able to isolate the piece of our system that is  
> causing the increase in GC times.  We can now reproduce on multiple  
> machines in a much faster manner (about 20 minutes).  The attached  
> log represents a run from just this component.  There is a full GC  
> part way through the run that we forced using visual vm.  The  
> following interesting facts are presented for your amusement/ 
> edification:
>
> 1) the young gen pause times increase steadily over the course of  
> the run
> 2) the full GC doesn't effect the young gen pause times
> 3) the component in question makes heavy use of JNI
>
> I suspect the 3rd fact is the most interesting one.  We will  
> [obviously] be looking into reducing the test case further and  
> possibly fixing our JNI code.  But this represents a huge leap  
> forward in our understanding of the problem.
>
> Matt
>
>
> On Fri, May 14, 2010 at 3:32 PM, Y. Srinivas Ramakrishna  
> <y.s.ramakrishna at oracle.com> wrote:
> Matt -- Yes, comparative data for all these for 6u20 and jdk 7
> would be great. Naturally, server 1 is most immediately useful
> for determining if 6631166 addresses this at all,
> but others would be useful too if it turns out it doesn't
> (i.e. if jdk 7's server 1 turns out to be no better than 6u20's --
> at which point we should get this into the right channel -- open a  
> bug,
> and a support case).
>
> thanks.
> -- ramki
>
>
> On 05/14/10 12:24, Matt Fowles wrote:
> Ramki~
>
> I am preparing the flags for the next 3 runs (which run in  
> parallel) and wanted to check a few things with you.  I believe  
> that each of these is collecting a useful data point,
>
> Server 1 is running with 8 threads, reduced young gen, and MTT 1.
> Server 2 is running with 8 threads, reduced young gen, and MTT 1,  
> ParNew, but NOT CMS.
> Server 3 is running with 8 threads, reduced young gen, and MTT 1,  
> and PrintFLSStatistics.
>
> I can (additionally) run all of these tests on JDK7 (Java HotSpot 
> (TM) 64-Bit Server VM (build 17.0-b05, mixed mode)).
>
> Server 1:
>            -verbose:gc
>            -XX:+PrintGCTimeStamps
>            -XX:+PrintGCDetails
>            -XX:+PrintGCTaskTimeStamps
>            -XX:+PrintCommandLineFlags
>
>            -Xms32g -Xmx32g -Xmn1g
>            -XX:+UseParNewGC
>            -XX:ParallelGCThreads=8
>            -XX:+UseConcMarkSweepGC
>            -XX:ParallelCMSThreads=8
>            -XX:MaxTenuringThreshold=1
>            -XX:SurvivorRatio=14
>            -XX:+CMSParallelRemarkEnabled
>            -Xloggc:gc1.log
>            -XX:+UseLargePages             -XX:+AlwaysPreTouch
>
> Server 2:
>            -verbose:gc
>            -XX:+PrintGCTimeStamps
>            -XX:+PrintGCDetails
>            -XX:+PrintGCTaskTimeStamps
>            -XX:+PrintCommandLineFlags
>
>            -Xms32g -Xmx32g -Xmn1g
>            -XX:+UseParNewGC
>            -XX:ParallelGCThreads=8
>            -XX:MaxTenuringThreshold=1
>            -XX:SurvivorRatio=14
>            -Xloggc:gc2.log
>            -XX:+UseLargePages             -XX:+AlwaysPreTouch
>
>
> Server 3:
>            -verbose:gc
>            -XX:+PrintGCTimeStamps
>            -XX:+PrintGCDetails
>            -XX:+PrintGCTaskTimeStamps
>            -XX:+PrintCommandLineFlags
>
>            -Xms32g -Xmx32g -Xmn1g
>            -XX:+UseParNewGC
>            -XX:ParallelGCThreads=8
>            -XX:+UseConcMarkSweepGC
>            -XX:ParallelCMSThreads=8
>            -XX:MaxTenuringThreshold=1
>            -XX:SurvivorRatio=14
>            -XX:+CMSParallelRemarkEnabled
>            -Xloggc:gc3.log
>
>            -XX:PrintFLSStatistics=2
>            -XX:+UseLargePages
>            -XX:+AlwaysPreTouch
>  Matt
>
> On Fri, May 14, 2010 at 1:44 PM, Y. Srinivas Ramakrishna  
> <y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>>  
> wrote:
>  > On 05/14/10 10:36, Y. Srinivas Ramakrishna wrote:
>  >>
>  >> On 05/14/10 10:24, Matt Fowles wrote:
>  >>>
>  >>> Jon~
>  >>>
>  >>> That makes, sense but the fact is that the old gen *never* get
>  >>> collected.  So all the allocations happen from the giant empty  
> space
>  >>> at the end of the free list.  I thought fragmentation only  
> occurred
>  >>> when the free lists are added to after freeing memory...
>  >>
>  >> As Jon indicated allocation is done from free lists of blocks
>  >> that are pre-carved on demand to avoid contention while  
> allocating.
>  >> The old heuristics for how large to make those lists and the
>  >> inventory to hold in those lists was not working well as you
>  >> scaled the number of workers. Following 6631166 we believe it
>  >> works better and causes both less contention and less
>  >> fragmentation than it did before, because we do not hold
>  >> unnecessary excess inventory of free blocks.
>  >
>  > To see what the fragmentation is, try -XX:PrintFLSStatistics=2.
>  > This will slow down your scavenge pauses (perhaps by quite a bit
>  > for your 26 GB heap), but you will get a report of the number of
>  > blocks on free lists and how fragmented the space is on that ccount
>  > (for some appropriate notion of fragmentation). Don't use that
>  > flag in production though :-)
>  >
>  > -- ramki
>  >
>  >>
>  >> The fragmentation in turn causes card-scanning to suffer
>  >> adversely, besides the issues with loss of spatial locality also
>  >> increasing cache misses and TLB misses. (The large page
>  >> option might help mitigate the latter a bit, especially
>  >> since you have such a large heap and our fragmented
>  >> allocation may be exacerbating the TLB pressure.)
>  >>
>  >> -- ramki
>  >>
>  >>> Matt
>  >>>
>  >>> On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu  
> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>>
>
>  >>> wrote:
>  >>>>
>  >>>> Matt,
>  >>>>
>  >>>> To amplify on Ramki's comment, the allocations out of the
>  >>>> old generation are always from a free list.  During a young
>  >>>> generation collection each GC thread will get its own
>  >>>> local free lists from the old generation so that it can
>  >>>> copy objects to the old generation without synchronizing
>  >>>> with the other GC thread (most of the time).  Objects from
>  >>>> a GC thread's local free lists are pushed to the globals lists
>  >>>> after the collection (as far as I recall). So there is some
>  >>>> churn in the free lists.
>  >>>>
>  >>>> Jon
>  >>>>
>  >>>> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote:
>  >>>>>
>  >>>>> On 05/13/10 10:50, Matt Fowles wrote:
>  >>>>>>
>  >>>>>> Jon~
>  >>>>>>
>  >>>>>> This may sound naive, but how can fragmentation be an issue  
> if the old
>  >>>>>> gen has never been collected?  I would think we are still  
> in the space
>  >>>>>> where we can just bump the old gen alloc pointer...
>  >>>>>
>  >>>>> Matt, The old gen allocator may fragment the space.  
> Allocation is not
>  >>>>> exactly "bump a pointer".
>  >>>>>
>  >>>>> -- ramki
>  >>>>>
>  >>>>>> Matt
>  >>>>>>
>  >>>>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu
>  >>>>>> <jon.masamitsu at oracle.com  
> <mailto:jon.masamitsu at oracle.com>> wrote:
>  >>>>>>>
>  >>>>>>> Matt,
>  >>>>>>>
>  >>>>>>> As Ramki indicated fragmentation might be an issue.  As the
>  >>>>>>> fragmentation
>  >>>>>>> in the old generation increases, it takes longer to find  
> space in the
>  >>>>>>> old
>  >>>>>>> generation
>  >>>>>>> into which to promote objects from the young generation.   
> This is
>  >>>>>>> apparently
>  >>>>>>> not
>  >>>>>>> the problem that Wayne is having but you still might be  
> hitting it.
>  >>>>>>>  If
>  >>>>>>> you
>  >>>>>>> can
>  >>>>>>> connect jconsole to the VM and force a full GC, that would  
> tell us if
>  >>>>>>> it's
>  >>>>>>> fragmentation.
>  >>>>>>>
>  >>>>>>> There might be a scaling issue with the UseParNewGC.  If  
> you can use
>  >>>>>>> -XX:-UseParNewGC (turning off the parallel young
>  >>>>>>> generation collection) with  -XX:+UseConcMarkSweepGC the  
> pauses
>  >>>>>>> will be longer but may be more stable.  That's not the  
> solution but
>  >>>>>>> just
>  >>>>>>> part
>  >>>>>>> of the investigation.
>  >>>>>>>
>  >>>>>>> You could try just -XX:+UseParNewGC without -XX: 
> +UseConcMarkSweepGC
>  >>>>>>> and if you don't see the growing young generation pause,  
> that would
>  >>>>>>> indicate
>  >>>>>>> something specific about promotion into the CMS generation.
>  >>>>>>>
>  >>>>>>> UseParallelGC is different from UseParNewGC in a number of  
> ways
>  >>>>>>> and if you try UseParallelGC and still see the growing young
>  >>>>>>> generation
>  >>>>>>> pauses, I'd suspect something special about your application.
>  >>>>>>>
>  >>>>>>> If you can run these experiments hopefully they will tell
>  >>>>>>> us where to look next.
>  >>>>>>>
>  >>>>>>> Jon
>  >>>>>>>
>  >>>>>>>
>  >>>>>>> On 05/12/10 15:19, Matt Fowles wrote:
>  >>>>>>>
>  >>>>>>> All~
>  >>>>>>>
>  >>>>>>> I have a large app that produces ~4g of garbage every 30  
> seconds and
>  >>>>>>> am trying to reduce the size of gc outliers.  About 99% of  
> this data
>  >>>>>>> is garbage, but almost anything that survives one  
> collection survives
>  >>>>>>> for an indeterminately long amount of time.  We are  
> currently using
>  >>>>>>> the following VM and options:
>  >>>>>>>
>  >>>>>>> java version "1.6.0_20"
>  >>>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>  >>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed  
> mode)
>  >>>>>>>
>  >>>>>>>              -verbose:gc
>  >>>>>>>              -XX:+PrintGCTimeStamps
>  >>>>>>>              -XX:+PrintGCDetails
>  >>>>>>>              -XX:+PrintGCTaskTimeStamps
>  >>>>>>>              -XX:+PrintTenuringDistribution
>  >>>>>>>              -XX:+PrintCommandLineFlags
>  >>>>>>>              -XX:+PrintReferenceGC
>  >>>>>>>              -Xms32g -Xmx32g -Xmn4g
>  >>>>>>>              -XX:+UseParNewGC
>  >>>>>>>              -XX:ParallelGCThreads=4
>  >>>>>>>              -XX:+UseConcMarkSweepGC
>  >>>>>>>              -XX:ParallelCMSThreads=4
>  >>>>>>>              -XX:CMSInitiatingOccupancyFraction=60
>  >>>>>>>              -XX:+UseCMSInitiatingOccupancyOnly
>  >>>>>>>              -XX:+CMSParallelRemarkEnabled
>  >>>>>>>              -XX:MaxGCPauseMillis=50
>  >>>>>>>              -Xloggc:gc.log
>  >>>>>>>
>  >>>>>>>
>  >>>>>>> As you can see from the GC log, we never actually reach  
> the point
>  >>>>>>> where the CMS kicks in (after app startup).  But our young  
> gens seem
>  >>>>>>> to take increasingly long to collect as time goes by.
>  >>>>>>>
>  >>>>>>> The steady state of the app is reached around 956.392 into  
> the log
>  >>>>>>> with a collection that takes 0.106 seconds.  Thereafter  
> the survivor
>  >>>>>>> space remains roughly constantly as filled and the amount  
> promoted to
>  >>>>>>> old gen also remains constant, but the collection times  
> increase to
>  >>>>>>> 2.855 seconds by the end of the 3.5 hour run.
>  >>>>>>>
>  >>>>>>> Has anyone seen this sort of behavior before?  Are there more
>  >>>>>>> switches
>  >>>>>>> that I should try running with?
>  >>>>>>>
>  >>>>>>> Obviously, I am working to profile the app and reduce the  
> garbage
>  >>>>>>> load
>  >>>>>>> in parallel.  But if I still see this sort of problem, it  
> is only a
>  >>>>>>> question of how long must the app run before I see  
> unacceptable
>  >>>>>>> latency spikes.
>  >>>>>>>
>  >>>>>>> Matt
>  >>>>>>>
>  >>>>>>> ________________________________
>  >>>>>>> _______________________________________________
>  >>>>>>> hotspot-gc-use mailing list
>  >>>>>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc- 
> use at openjdk.java.net>
>
>  >>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>  >>>>>>
>  >>>>>> _______________________________________________
>  >>>>>> hotspot-gc-use mailing list
>  >>>>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc- 
> use at openjdk.java.net>
>
>  >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>  >>
>  >> _______________________________________________
>  >> hotspot-gc-use mailing list
>  >> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc- 
> use at openjdk.java.net>
>
>  >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>  >
>  >
>
>
>
> <gc.log.gz><ATT00001..txt>


_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use



More information about the hotspot-gc-dev mailing list