Growing GC Young Gen Times

Fri May 14 10:23:38 PDT 2010

On 05/14/10 10:07, Matt Fowles wrote:
> Ramki~
> 
> The machine has 4 cpus each of which have 4 cores.  I will adjust the

Great, thanks. I'd suggest make ParallelGCThreads=8. Also compare with
-XX:-UseParNewGC. if it's the kind of fragmentation that we
believe may be the cause here, you'd see larger gc times in the
latter case but they would not increase as they do now. But that
is conjecture at this point.

> survivor spaces as you suggest.  Previously I had been running with
> MTT 0, but change it to 4 at the suggestion of others.

MTT=0 can give very poor performance, as people said MTT=4
would definitely be better here than MTT=0.
You should use MTT=1 here though.

> 
> Running with the JDK7 version may take a bit of time, but I will
> pursue that as well.

All you should do is pull the libjvm.so that is in the JDK 7 installation
(or bundle) and plonk it down into the appropriate directory of your
existing JDK 6u20 installation. We just want to see the results with
the latest JVM which includes a fix for 6631166.

I attached a very rough plot of some metrics extracted from your log
and this behaviour is definitely deserving of a bug, especially
if it can be shown that it happens in the latest JVM. In the plot:

   red: scavenge durations
   dark blue: promoted data per scavenge
   pink: data in survivor space following scavenge
   light blue: live data in old gen

As you can see the scavenge clearly correlates with the
occupancy of the old gen (as Jon and others indicated).
Did you try Jon's suggestion of doing a manual GC at that
point via jconsole, and seeing if the upward trend of
scavenges continues beyond that?

Did you use -XX:+UseLargePages and -XX:+AlwaysPreTouch?

Do you have an easily used test case that you can share with us via
your support channels? If/when you do so, please copy me and
send them a reference to this thread on this mailing list.

later, with your new data.
-- ramki

> 
> Matt
> 
> 
> 
> On Fri, May 14, 2010 at 12:58 PM, Y. Srinivas Ramakrishna
> <y.s.ramakrishna at oracle.com> wrote:
>> Hi Matt -- i am computing some metrics from yr log file
>> and would like to know how many cpu's you have for the logs below?
>>
>> Also, as you noted, almost anything that survives a scavenge
>> lives for a while. To reduce the overhead of unnecessary
>> back-and-forth copying in the survivor spaces, just use
>> MaxTenuringThreshold=1 (This suggestion was also made by
>> several others in the thread, and is corroborated by your
>> PrintTenuringDistribution data). Since you have farily large survivor
>> spaces configured now, (at least large enough to fit 4 age cohorts,
>> which will be down to 1 age cohort if you use MTT=1), i'd
>> suggest making your surviror spaces smaller, may be down to
>> about 64 MB from the current 420 MB each, and give the excess
>> to your Eden space.
>>
>> Then use 6u21 when it comes out (or ask your Java support to
>> send you a 6u21 for a beta test), or drop in a JVM from JDK 7 into
>> your 6u20 installation, and run with that. If you still see
>> rising pause times let me know or file a bug, and send us the
>> log file and JVM options along with full platform information.
>>
>> I'll run some metrics from yr log file if you send me the info
>> re platform above, and that may perhaps reveal a few more secrets.
>>
>> later.
>> -- ramki
>>
>> On 05/12/10 15:19, Matt Fowles wrote:
>>> All~
>>>
>>> I have a large app that produces ~4g of garbage every 30 seconds and
>>> am trying to reduce the size of gc outliers.  About 99% of this data
>>> is garbage, but almost anything that survives one collection survives
>>> for an indeterminately long amount of time.  We are currently using
>>> the following VM and options:
>>>
>>> java version "1.6.0_20"
>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>>>
>>>               -verbose:gc
>>>               -XX:+PrintGCTimeStamps
>>>               -XX:+PrintGCDetails
>>>               -XX:+PrintGCTaskTimeStamps
>>>               -XX:+PrintTenuringDistribution
>>>               -XX:+PrintCommandLineFlags
>>>               -XX:+PrintReferenceGC
>>>               -Xms32g -Xmx32g -Xmn4g
>>>               -XX:+UseParNewGC
>>>               -XX:ParallelGCThreads=4
>>>               -XX:+UseConcMarkSweepGC
>>>               -XX:ParallelCMSThreads=4
>>>               -XX:CMSInitiatingOccupancyFraction=60
>>>               -XX:+UseCMSInitiatingOccupancyOnly
>>>               -XX:+CMSParallelRemarkEnabled
>>>               -XX:MaxGCPauseMillis=50
>>>               -Xloggc:gc.log
>>>
>>>
>>> As you can see from the GC log, we never actually reach the point
>>> where the CMS kicks in (after app startup).  But our young gens seem
>>> to take increasingly long to collect as time goes by.
>>>
>>> The steady state of the app is reached around 956.392 into the log
>>> with a collection that takes 0.106 seconds.  Thereafter the survivor
>>> space remains roughly constantly as filled and the amount promoted to
>>> old gen also remains constant, but the collection times increase to
>>> 2.855 seconds by the end of the 3.5 hour run.
>>>
>>> Has anyone seen this sort of behavior before?  Are there more switches
>>> that I should try running with?
>>>
>>> Obviously, I am working to profile the app and reduce the garbage load
>>> in parallel.  But if I still see this sort of problem, it is only a
>>> question of how long must the app run before I see unacceptable
>>> latency spikes.
>>>
>>> Matt
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: rough_plot.gif
Type: image/gif
Size: 16547 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100514/acdad9eb/attachment-0001.gif