Growing GC Young Gen Times
Matt Fowles
matt.fowles at gmail.com
Thu Jun 3 14:40:11 PDT 2010
Eric~
That is my suspicion as well. It would be nice if there were a flag or
something to print stats for JNI handles...
Matt
On Thu, Jun 3, 2010 at 4:57 PM, Eric Caspole <eric.caspole at amd.com> wrote:
> Hi Matt,
> I had a problem like this in a previous job. It turned out to be leaking
> jni handles. The jni handles get scanned during the GC even if they are
> stale/useless. So I would carefully inspect your JNI code for incorrect use
> of jni handles.
>
> I debugged this problem by doing before/after oprofiles and comparing the
> runs to see what was taking more time in gc, if that is any help to you.
> Regards,
> Eric
>
>
>
> On Jun 3, 2010, at 1:17 PM, Matt Fowles wrote:
>
> All~
>>
>> Today we were able to isolate the piece of our system that is causing the
>> increase in GC times. We can now reproduce on multiple machines in a much
>> faster manner (about 20 minutes). The attached log represents a run from
>> just this component. There is a full GC part way through the run that we
>> forced using visual vm. The following interesting facts are presented for
>> your amusement/edification:
>>
>> 1) the young gen pause times increase steadily over the course of the run
>> 2) the full GC doesn't effect the young gen pause times
>> 3) the component in question makes heavy use of JNI
>>
>> I suspect the 3rd fact is the most interesting one. We will [obviously]
>> be looking into reducing the test case further and possibly fixing our JNI
>> code. But this represents a huge leap forward in our understanding of the
>> problem.
>>
>> Matt
>>
>>
>> On Fri, May 14, 2010 at 3:32 PM, Y. Srinivas Ramakrishna <
>> y.s.ramakrishna at oracle.com> wrote:
>> Matt -- Yes, comparative data for all these for 6u20 and jdk 7
>> would be great. Naturally, server 1 is most immediately useful
>> for determining if 6631166 addresses this at all,
>> but others would be useful too if it turns out it doesn't
>> (i.e. if jdk 7's server 1 turns out to be no better than 6u20's --
>> at which point we should get this into the right channel -- open a bug,
>> and a support case).
>>
>> thanks.
>> -- ramki
>>
>>
>> On 05/14/10 12:24, Matt Fowles wrote:
>> Ramki~
>>
>> I am preparing the flags for the next 3 runs (which run in parallel) and
>> wanted to check a few things with you. I believe that each of these is
>> collecting a useful data point,
>>
>> Server 1 is running with 8 threads, reduced young gen, and MTT 1.
>> Server 2 is running with 8 threads, reduced young gen, and MTT 1, ParNew,
>> but NOT CMS.
>> Server 3 is running with 8 threads, reduced young gen, and MTT 1, and
>> PrintFLSStatistics.
>>
>> I can (additionally) run all of these tests on JDK7 (Java HotSpot(TM)
>> 64-Bit Server VM (build 17.0-b05, mixed mode)).
>>
>> Server 1:
>> -verbose:gc
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCTaskTimeStamps
>> -XX:+PrintCommandLineFlags
>>
>> -Xms32g -Xmx32g -Xmn1g
>> -XX:+UseParNewGC
>> -XX:ParallelGCThreads=8
>> -XX:+UseConcMarkSweepGC
>> -XX:ParallelCMSThreads=8
>> -XX:MaxTenuringThreshold=1
>> -XX:SurvivorRatio=14
>> -XX:+CMSParallelRemarkEnabled
>> -Xloggc:gc1.log
>> -XX:+UseLargePages -XX:+AlwaysPreTouch
>>
>> Server 2:
>> -verbose:gc
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCTaskTimeStamps
>> -XX:+PrintCommandLineFlags
>>
>> -Xms32g -Xmx32g -Xmn1g
>> -XX:+UseParNewGC
>> -XX:ParallelGCThreads=8
>> -XX:MaxTenuringThreshold=1
>> -XX:SurvivorRatio=14
>> -Xloggc:gc2.log
>> -XX:+UseLargePages -XX:+AlwaysPreTouch
>>
>>
>> Server 3:
>> -verbose:gc
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCTaskTimeStamps
>> -XX:+PrintCommandLineFlags
>>
>> -Xms32g -Xmx32g -Xmn1g
>> -XX:+UseParNewGC
>> -XX:ParallelGCThreads=8
>> -XX:+UseConcMarkSweepGC
>> -XX:ParallelCMSThreads=8
>> -XX:MaxTenuringThreshold=1
>> -XX:SurvivorRatio=14
>> -XX:+CMSParallelRemarkEnabled
>> -Xloggc:gc3.log
>>
>> -XX:PrintFLSStatistics=2
>> -XX:+UseLargePages
>> -XX:+AlwaysPreTouch
>> Matt
>>
>> On Fri, May 14, 2010 at 1:44 PM, Y. Srinivas Ramakrishna <
>> y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>> wrote:
>> > On 05/14/10 10:36, Y. Srinivas Ramakrishna wrote:
>> >>
>> >> On 05/14/10 10:24, Matt Fowles wrote:
>> >>>
>> >>> Jon~
>> >>>
>> >>> That makes, sense but the fact is that the old gen *never* get
>> >>> collected. So all the allocations happen from the giant empty space
>> >>> at the end of the free list. I thought fragmentation only occurred
>> >>> when the free lists are added to after freeing memory...
>> >>
>> >> As Jon indicated allocation is done from free lists of blocks
>> >> that are pre-carved on demand to avoid contention while allocating.
>> >> The old heuristics for how large to make those lists and the
>> >> inventory to hold in those lists was not working well as you
>> >> scaled the number of workers. Following 6631166 we believe it
>> >> works better and causes both less contention and less
>> >> fragmentation than it did before, because we do not hold
>> >> unnecessary excess inventory of free blocks.
>> >
>> > To see what the fragmentation is, try -XX:PrintFLSStatistics=2.
>> > This will slow down your scavenge pauses (perhaps by quite a bit
>> > for your 26 GB heap), but you will get a report of the number of
>> > blocks on free lists and how fragmented the space is on that ccount
>> > (for some appropriate notion of fragmentation). Don't use that
>> > flag in production though :-)
>> >
>> > -- ramki
>> >
>> >>
>> >> The fragmentation in turn causes card-scanning to suffer
>> >> adversely, besides the issues with loss of spatial locality also
>> >> increasing cache misses and TLB misses. (The large page
>> >> option might help mitigate the latter a bit, especially
>> >> since you have such a large heap and our fragmented
>> >> allocation may be exacerbating the TLB pressure.)
>> >>
>> >> -- ramki
>> >>
>> >>> Matt
>> >>>
>> >>> On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu <
>> jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>>
>>
>> >>> wrote:
>> >>>>
>> >>>> Matt,
>> >>>>
>> >>>> To amplify on Ramki's comment, the allocations out of the
>> >>>> old generation are always from a free list. During a young
>> >>>> generation collection each GC thread will get its own
>> >>>> local free lists from the old generation so that it can
>> >>>> copy objects to the old generation without synchronizing
>> >>>> with the other GC thread (most of the time). Objects from
>> >>>> a GC thread's local free lists are pushed to the globals lists
>> >>>> after the collection (as far as I recall). So there is some
>> >>>> churn in the free lists.
>> >>>>
>> >>>> Jon
>> >>>>
>> >>>> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote:
>> >>>>>
>> >>>>> On 05/13/10 10:50, Matt Fowles wrote:
>> >>>>>>
>> >>>>>> Jon~
>> >>>>>>
>> >>>>>> This may sound naive, but how can fragmentation be an issue if the
>> old
>> >>>>>> gen has never been collected? I would think we are still in the
>> space
>> >>>>>> where we can just bump the old gen alloc pointer...
>> >>>>>
>> >>>>> Matt, The old gen allocator may fragment the space. Allocation is
>> not
>> >>>>> exactly "bump a pointer".
>> >>>>>
>> >>>>> -- ramki
>> >>>>>
>> >>>>>> Matt
>> >>>>>>
>> >>>>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu
>> >>>>>> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>>
>> wrote:
>> >>>>>>>
>> >>>>>>> Matt,
>> >>>>>>>
>> >>>>>>> As Ramki indicated fragmentation might be an issue. As the
>> >>>>>>> fragmentation
>> >>>>>>> in the old generation increases, it takes longer to find space in
>> the
>> >>>>>>> old
>> >>>>>>> generation
>> >>>>>>> into which to promote objects from the young generation. This is
>> >>>>>>> apparently
>> >>>>>>> not
>> >>>>>>> the problem that Wayne is having but you still might be hitting
>> it.
>> >>>>>>> If
>> >>>>>>> you
>> >>>>>>> can
>> >>>>>>> connect jconsole to the VM and force a full GC, that would tell
>> us if
>> >>>>>>> it's
>> >>>>>>> fragmentation.
>> >>>>>>>
>> >>>>>>> There might be a scaling issue with the UseParNewGC. If you can
>> use
>> >>>>>>> -XX:-UseParNewGC (turning off the parallel young
>> >>>>>>> generation collection) with -XX:+UseConcMarkSweepGC the pauses
>> >>>>>>> will be longer but may be more stable. That's not the solution
>> but
>> >>>>>>> just
>> >>>>>>> part
>> >>>>>>> of the investigation.
>> >>>>>>>
>> >>>>>>> You could try just -XX:+UseParNewGC without
>> -XX:+UseConcMarkSweepGC
>> >>>>>>> and if you don't see the growing young generation pause, that
>> would
>> >>>>>>> indicate
>> >>>>>>> something specific about promotion into the CMS generation.
>> >>>>>>>
>> >>>>>>> UseParallelGC is different from UseParNewGC in a number of ways
>> >>>>>>> and if you try UseParallelGC and still see the growing young
>> >>>>>>> generation
>> >>>>>>> pauses, I'd suspect something special about your application.
>> >>>>>>>
>> >>>>>>> If you can run these experiments hopefully they will tell
>> >>>>>>> us where to look next.
>> >>>>>>>
>> >>>>>>> Jon
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On 05/12/10 15:19, Matt Fowles wrote:
>> >>>>>>>
>> >>>>>>> All~
>> >>>>>>>
>> >>>>>>> I have a large app that produces ~4g of garbage every 30 seconds
>> and
>> >>>>>>> am trying to reduce the size of gc outliers. About 99% of this
>> data
>> >>>>>>> is garbage, but almost anything that survives one collection
>> survives
>> >>>>>>> for an indeterminately long amount of time. We are currently
>> using
>> >>>>>>> the following VM and options:
>> >>>>>>>
>> >>>>>>> java version "1.6.0_20"
>> >>>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>> >>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>> >>>>>>>
>> >>>>>>> -verbose:gc
>> >>>>>>> -XX:+PrintGCTimeStamps
>> >>>>>>> -XX:+PrintGCDetails
>> >>>>>>> -XX:+PrintGCTaskTimeStamps
>> >>>>>>> -XX:+PrintTenuringDistribution
>> >>>>>>> -XX:+PrintCommandLineFlags
>> >>>>>>> -XX:+PrintReferenceGC
>> >>>>>>> -Xms32g -Xmx32g -Xmn4g
>> >>>>>>> -XX:+UseParNewGC
>> >>>>>>> -XX:ParallelGCThreads=4
>> >>>>>>> -XX:+UseConcMarkSweepGC
>> >>>>>>> -XX:ParallelCMSThreads=4
>> >>>>>>> -XX:CMSInitiatingOccupancyFraction=60
>> >>>>>>> -XX:+UseCMSInitiatingOccupancyOnly
>> >>>>>>> -XX:+CMSParallelRemarkEnabled
>> >>>>>>> -XX:MaxGCPauseMillis=50
>> >>>>>>> -Xloggc:gc.log
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> As you can see from the GC log, we never actually reach the point
>> >>>>>>> where the CMS kicks in (after app startup). But our young gens
>> seem
>> >>>>>>> to take increasingly long to collect as time goes by.
>> >>>>>>>
>> >>>>>>> The steady state of the app is reached around 956.392 into the
>> log
>> >>>>>>> with a collection that takes 0.106 seconds. Thereafter the
>> survivor
>> >>>>>>> space remains roughly constantly as filled and the amount
>> promoted to
>> >>>>>>> old gen also remains constant, but the collection times increase
>> to
>> >>>>>>> 2.855 seconds by the end of the 3.5 hour run.
>> >>>>>>>
>> >>>>>>> Has anyone seen this sort of behavior before? Are there more
>> >>>>>>> switches
>> >>>>>>> that I should try running with?
>> >>>>>>>
>> >>>>>>> Obviously, I am working to profile the app and reduce the garbage
>> >>>>>>> load
>> >>>>>>> in parallel. But if I still see this sort of problem, it is only
>> a
>> >>>>>>> question of how long must the app run before I see unacceptable
>> >>>>>>> latency spikes.
>> >>>>>>>
>> >>>>>>> Matt
>> >>>>>>>
>> >>>>>>> ________________________________
>> >>>>>>> _______________________________________________
>> >>>>>>> hotspot-gc-use mailing list
>> >>>>>>> hotspot-gc-use at openjdk.java.net <mailto:
>> hotspot-gc-use at openjdk.java.net>
>>
>> >>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> hotspot-gc-use mailing list
>> >>>>>> hotspot-gc-use at openjdk.java.net <mailto:
>> hotspot-gc-use at openjdk.java.net>
>>
>> >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >>
>> >> _______________________________________________
>> >> hotspot-gc-use mailing list
>> >> hotspot-gc-use at openjdk.java.net <mailto:
>> hotspot-gc-use at openjdk.java.net>
>>
>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >
>> >
>>
>>
>>
>> <gc.log.gz><ATT00001..txt>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100603/2cdc3a9f/attachment-0001.html
More information about the hotspot-gc-use
mailing list