Growing GC Young Gen Times

Fri May 14 11:29:55 PDT 2010

Ramki~

Attached are 3 different runs with slightly tweaked VM settings based
on suggestions from this list and others.

All of them have reduced the MaxTenuringThreshold to 2.

gc1.log reduces the young gen size to 1g and the old gen size to 7g
initially.  As you can see from it, the young gen sweep speed does not
improve after the CMS sweep that occurs part way through the run.
gc2.log adds the -XX:+UseLargePages and -XX:+AlwaysPreTouch options to
the settings from gc1.log
gc3.log adds the -XX:+UseLargePages and -XX:+AlwaysPreTouch options to
a 4g young gen with the original 32g total heap.

Due to the quirks of the infrastructure running these tests, data
volumes (and hence allocation rates) are NOT comparable between runs.
Because the tests take ~4 hours to run, we run several tests in
parallel against different sources of data.

Matt

PS - Due to attachment size restrictions I am going to send each file
in its own email.

On Fri, May 14, 2010 at 1:23 PM, Y. Srinivas Ramakrishna
<y.s.ramakrishna at oracle.com> wrote:
> On 05/14/10 10:07, Matt Fowles wrote:
>>
>> Ramki~
>>
>> The machine has 4 cpus each of which have 4 cores.  I will adjust the
>
> Great, thanks. I'd suggest make ParallelGCThreads=8. Also compare with
> -XX:-UseParNewGC. if it's the kind of fragmentation that we
> believe may be the cause here, you'd see larger gc times in the
> latter case but they would not increase as they do now. But that
> is conjecture at this point.
>
>> survivor spaces as you suggest.  Previously I had been running with
>> MTT 0, but change it to 4 at the suggestion of others.
>
> MTT=0 can give very poor performance, as people said MTT=4
> would definitely be better here than MTT=0.
> You should use MTT=1 here though.
>
>>
>> Running with the JDK7 version may take a bit of time, but I will
>> pursue that as well.
>
> All you should do is pull the libjvm.so that is in the JDK 7 installation
> (or bundle) and plonk it down into the appropriate directory of your
> existing JDK 6u20 installation. We just want to see the results with
> the latest JVM which includes a fix for 6631166.
>
> I attached a very rough plot of some metrics extracted from your log
> and this behaviour is definitely deserving of a bug, especially
> if it can be shown that it happens in the latest JVM. In the plot:
>
>  red: scavenge durations
>  dark blue: promoted data per scavenge
>  pink: data in survivor space following scavenge
>  light blue: live data in old gen
>
> As you can see the scavenge clearly correlates with the
> occupancy of the old gen (as Jon and others indicated).
> Did you try Jon's suggestion of doing a manual GC at that
> point via jconsole, and seeing if the upward trend of
> scavenges continues beyond that?
>
> Did you use -XX:+UseLargePages and -XX:+AlwaysPreTouch?
>
> Do you have an easily used test case that you can share with us via
> your support channels? If/when you do so, please copy me and
> send them a reference to this thread on this mailing list.
>
> later, with your new data.
> -- ramki
>
>>
>> Matt
>>
>>
>>
>> On Fri, May 14, 2010 at 12:58 PM, Y. Srinivas Ramakrishna
>> <y.s.ramakrishna at oracle.com> wrote:
>>>
>>> Hi Matt -- i am computing some metrics from yr log file
>>> and would like to know how many cpu's you have for the logs below?
>>>
>>> Also, as you noted, almost anything that survives a scavenge
>>> lives for a while. To reduce the overhead of unnecessary
>>> back-and-forth copying in the survivor spaces, just use
>>> MaxTenuringThreshold=1 (This suggestion was also made by
>>> several others in the thread, and is corroborated by your
>>> PrintTenuringDistribution data). Since you have farily large survivor
>>> spaces configured now, (at least large enough to fit 4 age cohorts,
>>> which will be down to 1 age cohort if you use MTT=1), i'd
>>> suggest making your surviror spaces smaller, may be down to
>>> about 64 MB from the current 420 MB each, and give the excess
>>> to your Eden space.
>>>
>>> Then use 6u21 when it comes out (or ask your Java support to
>>> send you a 6u21 for a beta test), or drop in a JVM from JDK 7 into
>>> your 6u20 installation, and run with that. If you still see
>>> rising pause times let me know or file a bug, and send us the
>>> log file and JVM options along with full platform information.
>>>
>>> I'll run some metrics from yr log file if you send me the info
>>> re platform above, and that may perhaps reveal a few more secrets.
>>>
>>> later.
>>> -- ramki
>>>
>>> On 05/12/10 15:19, Matt Fowles wrote:
>>>>
>>>> All~
>>>>
>>>> I have a large app that produces ~4g of garbage every 30 seconds and
>>>> am trying to reduce the size of gc outliers.  About 99% of this data
>>>> is garbage, but almost anything that survives one collection survives
>>>> for an indeterminately long amount of time.  We are currently using
>>>> the following VM and options:
>>>>
>>>> java version "1.6.0_20"
>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>>>>
>>>>              -verbose:gc
>>>>              -XX:+PrintGCTimeStamps
>>>>              -XX:+PrintGCDetails
>>>>              -XX:+PrintGCTaskTimeStamps
>>>>              -XX:+PrintTenuringDistribution
>>>>              -XX:+PrintCommandLineFlags
>>>>              -XX:+PrintReferenceGC
>>>>              -Xms32g -Xmx32g -Xmn4g
>>>>              -XX:+UseParNewGC
>>>>              -XX:ParallelGCThreads=4
>>>>              -XX:+UseConcMarkSweepGC
>>>>              -XX:ParallelCMSThreads=4
>>>>              -XX:CMSInitiatingOccupancyFraction=60
>>>>              -XX:+UseCMSInitiatingOccupancyOnly
>>>>              -XX:+CMSParallelRemarkEnabled
>>>>              -XX:MaxGCPauseMillis=50
>>>>              -Xloggc:gc.log
>>>>
>>>>
>>>> As you can see from the GC log, we never actually reach the point
>>>> where the CMS kicks in (after app startup).  But our young gens seem
>>>> to take increasingly long to collect as time goes by.
>>>>
>>>> The steady state of the app is reached around 956.392 into the log
>>>> with a collection that takes 0.106 seconds.  Thereafter the survivor
>>>> space remains roughly constantly as filled and the amount promoted to
>>>> old gen also remains constant, but the collection times increase to
>>>> 2.855 seconds by the end of the 3.5 hour run.
>>>>
>>>> Has anyone seen this sort of behavior before?  Are there more switches
>>>> that I should try running with?
>>>>
>>>> Obviously, I am working to profile the app and reduce the garbage load
>>>> in parallel.  But if I still see this sort of problem, it is only a
>>>> question of how long must the app run before I see unacceptable
>>>> latency spikes.
>>>>
>>>> Matt
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: r229-vm-tweaks-gc1.log.bz2
Type: application/x-bzip2
Size: 45158 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100514/34386d25/attachment-0001.bin