Growing GC Young Gen Times

Fri May 14 12:24:11 PDT 2010

Ramki~

I am preparing the flags for the next 3 runs (which run in parallel) and
wanted to check a few things with you.  I believe that each of these is
collecting a useful data point,

Server 1 is running with 8 threads, reduced young gen, and MTT 1.
Server 2 is running with 8 threads, reduced young gen, and MTT 1, ParNew,
but NOT CMS.
Server 3 is running with 8 threads, reduced young gen, and MTT 1, and
PrintFLSStatistics.

I can (additionally) run all of these tests on JDK7 (Java HotSpot(TM) 64-Bit
Server VM (build 17.0-b05, mixed mode)).

Server 1:
            -verbose:gc
            -XX:+PrintGCTimeStamps
            -XX:+PrintGCDetails
            -XX:+PrintGCTaskTimeStamps
            -XX:+PrintCommandLineFlags

            -Xms32g -Xmx32g -Xmn1g
            -XX:+UseParNewGC
            -XX:ParallelGCThreads=8
            -XX:+UseConcMarkSweepGC
            -XX:ParallelCMSThreads=8
            -XX:MaxTenuringThreshold=1
            -XX:SurvivorRatio=14
            -XX:+CMSParallelRemarkEnabled
            -Xloggc:gc1.log
            -XX:+UseLargePages
            -XX:+AlwaysPreTouch

Server 2:
            -verbose:gc
            -XX:+PrintGCTimeStamps
            -XX:+PrintGCDetails
            -XX:+PrintGCTaskTimeStamps
            -XX:+PrintCommandLineFlags

            -Xms32g -Xmx32g -Xmn1g
            -XX:+UseParNewGC
            -XX:ParallelGCThreads=8
            -XX:MaxTenuringThreshold=1
            -XX:SurvivorRatio=14
            -Xloggc:gc2.log
            -XX:+UseLargePages
            -XX:+AlwaysPreTouch

Server 3:
            -verbose:gc
            -XX:+PrintGCTimeStamps
            -XX:+PrintGCDetails
            -XX:+PrintGCTaskTimeStamps
            -XX:+PrintCommandLineFlags

            -Xms32g -Xmx32g -Xmn1g
            -XX:+UseParNewGC
            -XX:ParallelGCThreads=8
            -XX:+UseConcMarkSweepGC
            -XX:ParallelCMSThreads=8
            -XX:MaxTenuringThreshold=1
            -XX:SurvivorRatio=14
            -XX:+CMSParallelRemarkEnabled
            -Xloggc:gc3.log

            -XX:PrintFLSStatistics=2
            -XX:+UseLargePages
            -XX:+AlwaysPreTouch

Matt

On Fri, May 14, 2010 at 1:44 PM, Y. Srinivas Ramakrishna <
y.s.ramakrishna at oracle.com> wrote:
> On 05/14/10 10:36, Y. Srinivas Ramakrishna wrote:
>>
>> On 05/14/10 10:24, Matt Fowles wrote:
>>>
>>> Jon~
>>>
>>> That makes, sense but the fact is that the old gen *never* get
>>> collected.  So all the allocations happen from the giant empty space
>>> at the end of the free list.  I thought fragmentation only occurred
>>> when the free lists are added to after freeing memory...
>>
>> As Jon indicated allocation is done from free lists of blocks
>> that are pre-carved on demand to avoid contention while allocating.
>> The old heuristics for how large to make those lists and the
>> inventory to hold in those lists was not working well as you
>> scaled the number of workers. Following 6631166 we believe it
>> works better and causes both less contention and less
>> fragmentation than it did before, because we do not hold
>> unnecessary excess inventory of free blocks.
>
> To see what the fragmentation is, try -XX:PrintFLSStatistics=2.
> This will slow down your scavenge pauses (perhaps by quite a bit
> for your 26 GB heap), but you will get a report of the number of
> blocks on free lists and how fragmented the space is on that ccount
> (for some appropriate notion of fragmentation). Don't use that
> flag in production though :-)
>
> -- ramki
>
>>
>> The fragmentation in turn causes card-scanning to suffer
>> adversely, besides the issues with loss of spatial locality also
>> increasing cache misses and TLB misses. (The large page
>> option might help mitigate the latter a bit, especially
>> since you have such a large heap and our fragmented
>> allocation may be exacerbating the TLB pressure.)
>>
>> -- ramki
>>
>>> Matt
>>>
>>> On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu <jon.masamitsu at oracle.com
>
>>> wrote:
>>>>
>>>> Matt,
>>>>
>>>> To amplify on Ramki's comment, the allocations out of the
>>>> old generation are always from a free list.  During a young
>>>> generation collection each GC thread will get its own
>>>> local free lists from the old generation so that it can
>>>> copy objects to the old generation without synchronizing
>>>> with the other GC thread (most of the time).  Objects from
>>>> a GC thread's local free lists are pushed to the globals lists
>>>> after the collection (as far as I recall). So there is some
>>>> churn in the free lists.
>>>>
>>>> Jon
>>>>
>>>> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote:
>>>>>
>>>>> On 05/13/10 10:50, Matt Fowles wrote:
>>>>>>
>>>>>> Jon~
>>>>>>
>>>>>> This may sound naive, but how can fragmentation be an issue if the
old
>>>>>> gen has never been collected?  I would think we are still in the
space
>>>>>> where we can just bump the old gen alloc pointer...
>>>>>
>>>>> Matt, The old gen allocator may fragment the space. Allocation is not
>>>>> exactly "bump a pointer".
>>>>>
>>>>> -- ramki
>>>>>
>>>>>> Matt
>>>>>>
>>>>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu
>>>>>> <jon.masamitsu at oracle.com> wrote:
>>>>>>>
>>>>>>> Matt,
>>>>>>>
>>>>>>> As Ramki indicated fragmentation might be an issue.  As the
>>>>>>> fragmentation
>>>>>>> in the old generation increases, it takes longer to find space in
the
>>>>>>> old
>>>>>>> generation
>>>>>>> into which to promote objects from the young generation.  This is
>>>>>>> apparently
>>>>>>> not
>>>>>>> the problem that Wayne is having but you still might be hitting it.
>>>>>>>  If
>>>>>>> you
>>>>>>> can
>>>>>>> connect jconsole to the VM and force a full GC, that would tell us
if
>>>>>>> it's
>>>>>>> fragmentation.
>>>>>>>
>>>>>>> There might be a scaling issue with the UseParNewGC.  If you can use
>>>>>>> -XX:-UseParNewGC (turning off the parallel young
>>>>>>> generation collection) with  -XX:+UseConcMarkSweepGC the pauses
>>>>>>> will be longer but may be more stable.  That's not the solution but
>>>>>>> just
>>>>>>> part
>>>>>>> of the investigation.
>>>>>>>
>>>>>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC
>>>>>>> and if you don't see the growing young generation pause, that would
>>>>>>> indicate
>>>>>>> something specific about promotion into the CMS generation.
>>>>>>>
>>>>>>> UseParallelGC is different from UseParNewGC in a number of ways
>>>>>>> and if you try UseParallelGC and still see the growing young
>>>>>>> generation
>>>>>>> pauses, I'd suspect something special about your application.
>>>>>>>
>>>>>>> If you can run these experiments hopefully they will tell
>>>>>>> us where to look next.
>>>>>>>
>>>>>>> Jon
>>>>>>>
>>>>>>>
>>>>>>> On 05/12/10 15:19, Matt Fowles wrote:
>>>>>>>
>>>>>>> All~
>>>>>>>
>>>>>>> I have a large app that produces ~4g of garbage every 30 seconds and
>>>>>>> am trying to reduce the size of gc outliers.  About 99% of this data
>>>>>>> is garbage, but almost anything that survives one collection
survives
>>>>>>> for an indeterminately long amount of time.  We are currently using
>>>>>>> the following VM and options:
>>>>>>>
>>>>>>> java version "1.6.0_20"
>>>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>>>>>>>
>>>>>>>              -verbose:gc
>>>>>>>              -XX:+PrintGCTimeStamps
>>>>>>>              -XX:+PrintGCDetails
>>>>>>>              -XX:+PrintGCTaskTimeStamps
>>>>>>>              -XX:+PrintTenuringDistribution
>>>>>>>              -XX:+PrintCommandLineFlags
>>>>>>>              -XX:+PrintReferenceGC
>>>>>>>              -Xms32g -Xmx32g -Xmn4g
>>>>>>>              -XX:+UseParNewGC
>>>>>>>              -XX:ParallelGCThreads=4
>>>>>>>              -XX:+UseConcMarkSweepGC
>>>>>>>              -XX:ParallelCMSThreads=4
>>>>>>>              -XX:CMSInitiatingOccupancyFraction=60
>>>>>>>              -XX:+UseCMSInitiatingOccupancyOnly
>>>>>>>              -XX:+CMSParallelRemarkEnabled
>>>>>>>              -XX:MaxGCPauseMillis=50
>>>>>>>              -Xloggc:gc.log
>>>>>>>
>>>>>>>
>>>>>>> As you can see from the GC log, we never actually reach the point
>>>>>>> where the CMS kicks in (after app startup).  But our young gens seem
>>>>>>> to take increasingly long to collect as time goes by.
>>>>>>>
>>>>>>> The steady state of the app is reached around 956.392 into the log
>>>>>>> with a collection that takes 0.106 seconds.  Thereafter the survivor
>>>>>>> space remains roughly constantly as filled and the amount promoted
to
>>>>>>> old gen also remains constant, but the collection times increase to
>>>>>>> 2.855 seconds by the end of the 3.5 hour run.
>>>>>>>
>>>>>>> Has anyone seen this sort of behavior before?  Are there more
>>>>>>> switches
>>>>>>> that I should try running with?
>>>>>>>
>>>>>>> Obviously, I am working to profile the app and reduce the garbage
>>>>>>> load
>>>>>>> in parallel.  But if I still see this sort of problem, it is only a
>>>>>>> question of how long must the app run before I see unacceptable
>>>>>>> latency spikes.
>>>>>>>
>>>>>>> Matt
>>>>>>>
>>>>>>> ________________________________
>>>>>>> _______________________________________________
>>>>>>> hotspot-gc-use mailing list
>>>>>>> hotspot-gc-use at openjdk.java.net
>>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>>
>>>>>> _______________________________________________
>>>>>> hotspot-gc-use mailing list
>>>>>> hotspot-gc-use at openjdk.java.net
>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100514/7d593fc0/attachment.html