Growing GC Young Gen Times
Eric Caspole
eric.caspole at amd.com
Thu Jun 3 20:57:07 UTC 2010
Hi Matt,
I had a problem like this in a previous job. It turned out to be
leaking jni handles. The jni handles get scanned during the GC even
if they are stale/useless. So I would carefully inspect your JNI code
for incorrect use of jni handles.
I debugged this problem by doing before/after oprofiles and comparing
the runs to see what was taking more time in gc, if that is any help
to you.
Regards,
Eric
On Jun 3, 2010, at 1:17 PM, Matt Fowles wrote:
> All~
>
> Today we were able to isolate the piece of our system that is
> causing the increase in GC times. We can now reproduce on multiple
> machines in a much faster manner (about 20 minutes). The attached
> log represents a run from just this component. There is a full GC
> part way through the run that we forced using visual vm. The
> following interesting facts are presented for your amusement/
> edification:
>
> 1) the young gen pause times increase steadily over the course of
> the run
> 2) the full GC doesn't effect the young gen pause times
> 3) the component in question makes heavy use of JNI
>
> I suspect the 3rd fact is the most interesting one. We will
> [obviously] be looking into reducing the test case further and
> possibly fixing our JNI code. But this represents a huge leap
> forward in our understanding of the problem.
>
> Matt
>
>
> On Fri, May 14, 2010 at 3:32 PM, Y. Srinivas Ramakrishna
> <y.s.ramakrishna at oracle.com> wrote:
> Matt -- Yes, comparative data for all these for 6u20 and jdk 7
> would be great. Naturally, server 1 is most immediately useful
> for determining if 6631166 addresses this at all,
> but others would be useful too if it turns out it doesn't
> (i.e. if jdk 7's server 1 turns out to be no better than 6u20's --
> at which point we should get this into the right channel -- open a
> bug,
> and a support case).
>
> thanks.
> -- ramki
>
>
> On 05/14/10 12:24, Matt Fowles wrote:
> Ramki~
>
> I am preparing the flags for the next 3 runs (which run in
> parallel) and wanted to check a few things with you. I believe
> that each of these is collecting a useful data point,
>
> Server 1 is running with 8 threads, reduced young gen, and MTT 1.
> Server 2 is running with 8 threads, reduced young gen, and MTT 1,
> ParNew, but NOT CMS.
> Server 3 is running with 8 threads, reduced young gen, and MTT 1,
> and PrintFLSStatistics.
>
> I can (additionally) run all of these tests on JDK7 (Java HotSpot
> (TM) 64-Bit Server VM (build 17.0-b05, mixed mode)).
>
> Server 1:
> -verbose:gc
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCTaskTimeStamps
> -XX:+PrintCommandLineFlags
>
> -Xms32g -Xmx32g -Xmn1g
> -XX:+UseParNewGC
> -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC
> -XX:ParallelCMSThreads=8
> -XX:MaxTenuringThreshold=1
> -XX:SurvivorRatio=14
> -XX:+CMSParallelRemarkEnabled
> -Xloggc:gc1.log
> -XX:+UseLargePages -XX:+AlwaysPreTouch
>
> Server 2:
> -verbose:gc
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCTaskTimeStamps
> -XX:+PrintCommandLineFlags
>
> -Xms32g -Xmx32g -Xmn1g
> -XX:+UseParNewGC
> -XX:ParallelGCThreads=8
> -XX:MaxTenuringThreshold=1
> -XX:SurvivorRatio=14
> -Xloggc:gc2.log
> -XX:+UseLargePages -XX:+AlwaysPreTouch
>
>
> Server 3:
> -verbose:gc
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCTaskTimeStamps
> -XX:+PrintCommandLineFlags
>
> -Xms32g -Xmx32g -Xmn1g
> -XX:+UseParNewGC
> -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC
> -XX:ParallelCMSThreads=8
> -XX:MaxTenuringThreshold=1
> -XX:SurvivorRatio=14
> -XX:+CMSParallelRemarkEnabled
> -Xloggc:gc3.log
>
> -XX:PrintFLSStatistics=2
> -XX:+UseLargePages
> -XX:+AlwaysPreTouch
> Matt
>
> On Fri, May 14, 2010 at 1:44 PM, Y. Srinivas Ramakrishna
> <y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>>
> wrote:
> > On 05/14/10 10:36, Y. Srinivas Ramakrishna wrote:
> >>
> >> On 05/14/10 10:24, Matt Fowles wrote:
> >>>
> >>> Jon~
> >>>
> >>> That makes, sense but the fact is that the old gen *never* get
> >>> collected. So all the allocations happen from the giant empty
> space
> >>> at the end of the free list. I thought fragmentation only
> occurred
> >>> when the free lists are added to after freeing memory...
> >>
> >> As Jon indicated allocation is done from free lists of blocks
> >> that are pre-carved on demand to avoid contention while
> allocating.
> >> The old heuristics for how large to make those lists and the
> >> inventory to hold in those lists was not working well as you
> >> scaled the number of workers. Following 6631166 we believe it
> >> works better and causes both less contention and less
> >> fragmentation than it did before, because we do not hold
> >> unnecessary excess inventory of free blocks.
> >
> > To see what the fragmentation is, try -XX:PrintFLSStatistics=2.
> > This will slow down your scavenge pauses (perhaps by quite a bit
> > for your 26 GB heap), but you will get a report of the number of
> > blocks on free lists and how fragmented the space is on that ccount
> > (for some appropriate notion of fragmentation). Don't use that
> > flag in production though :-)
> >
> > -- ramki
> >
> >>
> >> The fragmentation in turn causes card-scanning to suffer
> >> adversely, besides the issues with loss of spatial locality also
> >> increasing cache misses and TLB misses. (The large page
> >> option might help mitigate the latter a bit, especially
> >> since you have such a large heap and our fragmented
> >> allocation may be exacerbating the TLB pressure.)
> >>
> >> -- ramki
> >>
> >>> Matt
> >>>
> >>> On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu
> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>>
>
> >>> wrote:
> >>>>
> >>>> Matt,
> >>>>
> >>>> To amplify on Ramki's comment, the allocations out of the
> >>>> old generation are always from a free list. During a young
> >>>> generation collection each GC thread will get its own
> >>>> local free lists from the old generation so that it can
> >>>> copy objects to the old generation without synchronizing
> >>>> with the other GC thread (most of the time). Objects from
> >>>> a GC thread's local free lists are pushed to the globals lists
> >>>> after the collection (as far as I recall). So there is some
> >>>> churn in the free lists.
> >>>>
> >>>> Jon
> >>>>
> >>>> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote:
> >>>>>
> >>>>> On 05/13/10 10:50, Matt Fowles wrote:
> >>>>>>
> >>>>>> Jon~
> >>>>>>
> >>>>>> This may sound naive, but how can fragmentation be an issue
> if the old
> >>>>>> gen has never been collected? I would think we are still
> in the space
> >>>>>> where we can just bump the old gen alloc pointer...
> >>>>>
> >>>>> Matt, The old gen allocator may fragment the space.
> Allocation is not
> >>>>> exactly "bump a pointer".
> >>>>>
> >>>>> -- ramki
> >>>>>
> >>>>>> Matt
> >>>>>>
> >>>>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu
> >>>>>> <jon.masamitsu at oracle.com
> <mailto:jon.masamitsu at oracle.com>> wrote:
> >>>>>>>
> >>>>>>> Matt,
> >>>>>>>
> >>>>>>> As Ramki indicated fragmentation might be an issue. As the
> >>>>>>> fragmentation
> >>>>>>> in the old generation increases, it takes longer to find
> space in the
> >>>>>>> old
> >>>>>>> generation
> >>>>>>> into which to promote objects from the young generation.
> This is
> >>>>>>> apparently
> >>>>>>> not
> >>>>>>> the problem that Wayne is having but you still might be
> hitting it.
> >>>>>>> If
> >>>>>>> you
> >>>>>>> can
> >>>>>>> connect jconsole to the VM and force a full GC, that would
> tell us if
> >>>>>>> it's
> >>>>>>> fragmentation.
> >>>>>>>
> >>>>>>> There might be a scaling issue with the UseParNewGC. If
> you can use
> >>>>>>> -XX:-UseParNewGC (turning off the parallel young
> >>>>>>> generation collection) with -XX:+UseConcMarkSweepGC the
> pauses
> >>>>>>> will be longer but may be more stable. That's not the
> solution but
> >>>>>>> just
> >>>>>>> part
> >>>>>>> of the investigation.
> >>>>>>>
> >>>>>>> You could try just -XX:+UseParNewGC without -XX:
> +UseConcMarkSweepGC
> >>>>>>> and if you don't see the growing young generation pause,
> that would
> >>>>>>> indicate
> >>>>>>> something specific about promotion into the CMS generation.
> >>>>>>>
> >>>>>>> UseParallelGC is different from UseParNewGC in a number of
> ways
> >>>>>>> and if you try UseParallelGC and still see the growing young
> >>>>>>> generation
> >>>>>>> pauses, I'd suspect something special about your application.
> >>>>>>>
> >>>>>>> If you can run these experiments hopefully they will tell
> >>>>>>> us where to look next.
> >>>>>>>
> >>>>>>> Jon
> >>>>>>>
> >>>>>>>
> >>>>>>> On 05/12/10 15:19, Matt Fowles wrote:
> >>>>>>>
> >>>>>>> All~
> >>>>>>>
> >>>>>>> I have a large app that produces ~4g of garbage every 30
> seconds and
> >>>>>>> am trying to reduce the size of gc outliers. About 99% of
> this data
> >>>>>>> is garbage, but almost anything that survives one
> collection survives
> >>>>>>> for an indeterminately long amount of time. We are
> currently using
> >>>>>>> the following VM and options:
> >>>>>>>
> >>>>>>> java version "1.6.0_20"
> >>>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> >>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed
> mode)
> >>>>>>>
> >>>>>>> -verbose:gc
> >>>>>>> -XX:+PrintGCTimeStamps
> >>>>>>> -XX:+PrintGCDetails
> >>>>>>> -XX:+PrintGCTaskTimeStamps
> >>>>>>> -XX:+PrintTenuringDistribution
> >>>>>>> -XX:+PrintCommandLineFlags
> >>>>>>> -XX:+PrintReferenceGC
> >>>>>>> -Xms32g -Xmx32g -Xmn4g
> >>>>>>> -XX:+UseParNewGC
> >>>>>>> -XX:ParallelGCThreads=4
> >>>>>>> -XX:+UseConcMarkSweepGC
> >>>>>>> -XX:ParallelCMSThreads=4
> >>>>>>> -XX:CMSInitiatingOccupancyFraction=60
> >>>>>>> -XX:+UseCMSInitiatingOccupancyOnly
> >>>>>>> -XX:+CMSParallelRemarkEnabled
> >>>>>>> -XX:MaxGCPauseMillis=50
> >>>>>>> -Xloggc:gc.log
> >>>>>>>
> >>>>>>>
> >>>>>>> As you can see from the GC log, we never actually reach
> the point
> >>>>>>> where the CMS kicks in (after app startup). But our young
> gens seem
> >>>>>>> to take increasingly long to collect as time goes by.
> >>>>>>>
> >>>>>>> The steady state of the app is reached around 956.392 into
> the log
> >>>>>>> with a collection that takes 0.106 seconds. Thereafter
> the survivor
> >>>>>>> space remains roughly constantly as filled and the amount
> promoted to
> >>>>>>> old gen also remains constant, but the collection times
> increase to
> >>>>>>> 2.855 seconds by the end of the 3.5 hour run.
> >>>>>>>
> >>>>>>> Has anyone seen this sort of behavior before? Are there more
> >>>>>>> switches
> >>>>>>> that I should try running with?
> >>>>>>>
> >>>>>>> Obviously, I am working to profile the app and reduce the
> garbage
> >>>>>>> load
> >>>>>>> in parallel. But if I still see this sort of problem, it
> is only a
> >>>>>>> question of how long must the app run before I see
> unacceptable
> >>>>>>> latency spikes.
> >>>>>>>
> >>>>>>> Matt
> >>>>>>>
> >>>>>>> ________________________________
> >>>>>>> _______________________________________________
> >>>>>>> hotspot-gc-use mailing list
> >>>>>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-
> use at openjdk.java.net>
>
> >>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> hotspot-gc-use mailing list
> >>>>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-
> use at openjdk.java.net>
>
> >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>
> >> _______________________________________________
> >> hotspot-gc-use mailing list
> >> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-
> use at openjdk.java.net>
>
> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >
> >
>
>
>
> <gc.log.gz><ATT00001..txt>
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
More information about the hotspot-gc-dev
mailing list