Growing GC Young Gen Times

Fri May 14 12:32:45 PDT 2010

Matt -- Yes, comparative data for all these for 6u20 and jdk 7
would be great. Naturally, server 1 is most immediately useful
for determining if 6631166 addresses this at all,
but others would be useful too if it turns out it doesn't
(i.e. if jdk 7's server 1 turns out to be no better than 6u20's --
at which point we should get this into the right channel -- open a bug,
and a support case).

thanks.
-- ramki

On 05/14/10 12:24, Matt Fowles wrote:
> Ramki~
> 
> I am preparing the flags for the next 3 runs (which run in parallel) and 
> wanted to check a few things with you.  I believe that each of these is 
> collecting a useful data point,
> 
> Server 1 is running with 8 threads, reduced young gen, and MTT 1.
> Server 2 is running with 8 threads, reduced young gen, and MTT 1, 
> ParNew, but NOT CMS.
> Server 3 is running with 8 threads, reduced young gen, and MTT 1, 
> and PrintFLSStatistics.
> 
> I can (additionally) run all of these tests on JDK7 (Java HotSpot(TM) 
> 64-Bit Server VM (build 17.0-b05, mixed mode)).
> 
> Server 1:
>             -verbose:gc
>             -XX:+PrintGCTimeStamps
>             -XX:+PrintGCDetails
>             -XX:+PrintGCTaskTimeStamps
>             -XX:+PrintCommandLineFlags
> 
>             -Xms32g -Xmx32g -Xmn1g
>             -XX:+UseParNewGC
>             -XX:ParallelGCThreads=8
>             -XX:+UseConcMarkSweepGC
>             -XX:ParallelCMSThreads=8
>             -XX:MaxTenuringThreshold=1
>             -XX:SurvivorRatio=14
>             -XX:+CMSParallelRemarkEnabled
>             -Xloggc:gc1.log
>             -XX:+UseLargePages 
>             -XX:+AlwaysPreTouch
> 
> Server 2:
>             -verbose:gc
>             -XX:+PrintGCTimeStamps
>             -XX:+PrintGCDetails
>             -XX:+PrintGCTaskTimeStamps
>             -XX:+PrintCommandLineFlags
> 
>             -Xms32g -Xmx32g -Xmn1g
>             -XX:+UseParNewGC
>             -XX:ParallelGCThreads=8
>             -XX:MaxTenuringThreshold=1
>             -XX:SurvivorRatio=14
>             -Xloggc:gc2.log
>             -XX:+UseLargePages 
>             -XX:+AlwaysPreTouch
> 
> 
> Server 3:
>             -verbose:gc
>             -XX:+PrintGCTimeStamps
>             -XX:+PrintGCDetails
>             -XX:+PrintGCTaskTimeStamps
>             -XX:+PrintCommandLineFlags
> 
>             -Xms32g -Xmx32g -Xmn1g
>             -XX:+UseParNewGC
>             -XX:ParallelGCThreads=8
>             -XX:+UseConcMarkSweepGC
>             -XX:ParallelCMSThreads=8
>             -XX:MaxTenuringThreshold=1
>             -XX:SurvivorRatio=14
>             -XX:+CMSParallelRemarkEnabled
>             -Xloggc:gc3.log
> 
>             -XX:PrintFLSStatistics=2
>             -XX:+UseLargePages
>             -XX:+AlwaysPreTouch
>   
> Matt
> 
> On Fri, May 14, 2010 at 1:44 PM, Y. Srinivas Ramakrishna 
> <y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>> wrote:
>  > On 05/14/10 10:36, Y. Srinivas Ramakrishna wrote:
>  >>
>  >> On 05/14/10 10:24, Matt Fowles wrote:
>  >>>
>  >>> Jon~
>  >>>
>  >>> That makes, sense but the fact is that the old gen *never* get
>  >>> collected.  So all the allocations happen from the giant empty space
>  >>> at the end of the free list.  I thought fragmentation only occurred
>  >>> when the free lists are added to after freeing memory...
>  >>
>  >> As Jon indicated allocation is done from free lists of blocks
>  >> that are pre-carved on demand to avoid contention while allocating.
>  >> The old heuristics for how large to make those lists and the
>  >> inventory to hold in those lists was not working well as you
>  >> scaled the number of workers. Following 6631166 we believe it
>  >> works better and causes both less contention and less
>  >> fragmentation than it did before, because we do not hold
>  >> unnecessary excess inventory of free blocks.
>  >
>  > To see what the fragmentation is, try -XX:PrintFLSStatistics=2.
>  > This will slow down your scavenge pauses (perhaps by quite a bit
>  > for your 26 GB heap), but you will get a report of the number of
>  > blocks on free lists and how fragmented the space is on that ccount
>  > (for some appropriate notion of fragmentation). Don't use that
>  > flag in production though :-)
>  >
>  > -- ramki
>  >
>  >>
>  >> The fragmentation in turn causes card-scanning to suffer
>  >> adversely, besides the issues with loss of spatial locality also
>  >> increasing cache misses and TLB misses. (The large page
>  >> option might help mitigate the latter a bit, especially
>  >> since you have such a large heap and our fragmented
>  >> allocation may be exacerbating the TLB pressure.)
>  >>
>  >> -- ramki
>  >>
>  >>> Matt
>  >>>
>  >>> On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu 
> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>>
>  >>> wrote:
>  >>>>
>  >>>> Matt,
>  >>>>
>  >>>> To amplify on Ramki's comment, the allocations out of the
>  >>>> old generation are always from a free list.  During a young
>  >>>> generation collection each GC thread will get its own
>  >>>> local free lists from the old generation so that it can
>  >>>> copy objects to the old generation without synchronizing
>  >>>> with the other GC thread (most of the time).  Objects from
>  >>>> a GC thread's local free lists are pushed to the globals lists
>  >>>> after the collection (as far as I recall). So there is some
>  >>>> churn in the free lists.
>  >>>>
>  >>>> Jon
>  >>>>
>  >>>> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote:
>  >>>>>
>  >>>>> On 05/13/10 10:50, Matt Fowles wrote:
>  >>>>>>
>  >>>>>> Jon~
>  >>>>>>
>  >>>>>> This may sound naive, but how can fragmentation be an issue if 
> the old
>  >>>>>> gen has never been collected?  I would think we are still in the 
> space
>  >>>>>> where we can just bump the old gen alloc pointer...
>  >>>>>
>  >>>>> Matt, The old gen allocator may fragment the space. Allocation is not
>  >>>>> exactly "bump a pointer".
>  >>>>>
>  >>>>> -- ramki
>  >>>>>
>  >>>>>> Matt
>  >>>>>>
>  >>>>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu
>  >>>>>> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>  >>>>>>>
>  >>>>>>> Matt,
>  >>>>>>>
>  >>>>>>> As Ramki indicated fragmentation might be an issue.  As the
>  >>>>>>> fragmentation
>  >>>>>>> in the old generation increases, it takes longer to find space 
> in the
>  >>>>>>> old
>  >>>>>>> generation
>  >>>>>>> into which to promote objects from the young generation.  This is
>  >>>>>>> apparently
>  >>>>>>> not
>  >>>>>>> the problem that Wayne is having but you still might be hitting it.
>  >>>>>>>  If
>  >>>>>>> you
>  >>>>>>> can
>  >>>>>>> connect jconsole to the VM and force a full GC, that would tell 
> us if
>  >>>>>>> it's
>  >>>>>>> fragmentation.
>  >>>>>>>
>  >>>>>>> There might be a scaling issue with the UseParNewGC.  If you 
> can use
>  >>>>>>> -XX:-UseParNewGC (turning off the parallel young
>  >>>>>>> generation collection) with  -XX:+UseConcMarkSweepGC the pauses
>  >>>>>>> will be longer but may be more stable.  That's not the solution but
>  >>>>>>> just
>  >>>>>>> part
>  >>>>>>> of the investigation.
>  >>>>>>>
>  >>>>>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC
>  >>>>>>> and if you don't see the growing young generation pause, that would
>  >>>>>>> indicate
>  >>>>>>> something specific about promotion into the CMS generation.
>  >>>>>>>
>  >>>>>>> UseParallelGC is different from UseParNewGC in a number of ways
>  >>>>>>> and if you try UseParallelGC and still see the growing young
>  >>>>>>> generation
>  >>>>>>> pauses, I'd suspect something special about your application.
>  >>>>>>>
>  >>>>>>> If you can run these experiments hopefully they will tell
>  >>>>>>> us where to look next.
>  >>>>>>>
>  >>>>>>> Jon
>  >>>>>>>
>  >>>>>>>
>  >>>>>>> On 05/12/10 15:19, Matt Fowles wrote:
>  >>>>>>>
>  >>>>>>> All~
>  >>>>>>>
>  >>>>>>> I have a large app that produces ~4g of garbage every 30 
> seconds and
>  >>>>>>> am trying to reduce the size of gc outliers.  About 99% of this 
> data
>  >>>>>>> is garbage, but almost anything that survives one collection 
> survives
>  >>>>>>> for an indeterminately long amount of time.  We are currently using
>  >>>>>>> the following VM and options:
>  >>>>>>>
>  >>>>>>> java version "1.6.0_20"
>  >>>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>  >>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>  >>>>>>>
>  >>>>>>>              -verbose:gc
>  >>>>>>>              -XX:+PrintGCTimeStamps
>  >>>>>>>              -XX:+PrintGCDetails
>  >>>>>>>              -XX:+PrintGCTaskTimeStamps
>  >>>>>>>              -XX:+PrintTenuringDistribution
>  >>>>>>>              -XX:+PrintCommandLineFlags
>  >>>>>>>              -XX:+PrintReferenceGC
>  >>>>>>>              -Xms32g -Xmx32g -Xmn4g
>  >>>>>>>              -XX:+UseParNewGC
>  >>>>>>>              -XX:ParallelGCThreads=4
>  >>>>>>>              -XX:+UseConcMarkSweepGC
>  >>>>>>>              -XX:ParallelCMSThreads=4
>  >>>>>>>              -XX:CMSInitiatingOccupancyFraction=60
>  >>>>>>>              -XX:+UseCMSInitiatingOccupancyOnly
>  >>>>>>>              -XX:+CMSParallelRemarkEnabled
>  >>>>>>>              -XX:MaxGCPauseMillis=50
>  >>>>>>>              -Xloggc:gc.log
>  >>>>>>>
>  >>>>>>>
>  >>>>>>> As you can see from the GC log, we never actually reach the point
>  >>>>>>> where the CMS kicks in (after app startup).  But our young gens 
> seem
>  >>>>>>> to take increasingly long to collect as time goes by.
>  >>>>>>>
>  >>>>>>> The steady state of the app is reached around 956.392 into the log
>  >>>>>>> with a collection that takes 0.106 seconds.  Thereafter the 
> survivor
>  >>>>>>> space remains roughly constantly as filled and the amount 
> promoted to
>  >>>>>>> old gen also remains constant, but the collection times increase to
>  >>>>>>> 2.855 seconds by the end of the 3.5 hour run.
>  >>>>>>>
>  >>>>>>> Has anyone seen this sort of behavior before?  Are there more
>  >>>>>>> switches
>  >>>>>>> that I should try running with?
>  >>>>>>>
>  >>>>>>> Obviously, I am working to profile the app and reduce the garbage
>  >>>>>>> load
>  >>>>>>> in parallel.  But if I still see this sort of problem, it is only a
>  >>>>>>> question of how long must the app run before I see unacceptable
>  >>>>>>> latency spikes.
>  >>>>>>>
>  >>>>>>> Matt
>  >>>>>>>
>  >>>>>>> ________________________________
>  >>>>>>> _______________________________________________
>  >>>>>>> hotspot-gc-use mailing list
>  >>>>>>> hotspot-gc-use at openjdk.java.net 
> <mailto:hotspot-gc-use at openjdk.java.net>
>  >>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>  >>>>>>
>  >>>>>> _______________________________________________
>  >>>>>> hotspot-gc-use mailing list
>  >>>>>> hotspot-gc-use at openjdk.java.net 
> <mailto:hotspot-gc-use at openjdk.java.net>
>  >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>  >>
>  >> _______________________________________________
>  >> hotspot-gc-use mailing list
>  >> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>  >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>  >
>  >
>