Growing GC Young Gen Times

Fri May 14 11:30:14 PDT 2010

Ramki~

File 2.

Matt

On Fri, May 14, 2010 at 2:29 PM, Matt Fowles <matt.fowles at gmail.com> wrote:
> Ramki~
>
> Attached are 3 different runs with slightly tweaked VM settings based
> on suggestions from this list and others.
>
> All of them have reduced the MaxTenuringThreshold to 2.
>
> gc1.log reduces the young gen size to 1g and the old gen size to 7g
> initially.  As you can see from it, the young gen sweep speed does not
> improve after the CMS sweep that occurs part way through the run.
> gc2.log adds the -XX:+UseLargePages and -XX:+AlwaysPreTouch options to
> the settings from gc1.log
> gc3.log adds the -XX:+UseLargePages and -XX:+AlwaysPreTouch options to
> a 4g young gen with the original 32g total heap.
>
> Due to the quirks of the infrastructure running these tests, data
> volumes (and hence allocation rates) are NOT comparable between runs.
> Because the tests take ~4 hours to run, we run several tests in
> parallel against different sources of data.
>
> Matt
>
> PS - Due to attachment size restrictions I am going to send each file
> in its own email.
>
> On Fri, May 14, 2010 at 1:23 PM, Y. Srinivas Ramakrishna
> <y.s.ramakrishna at oracle.com> wrote:
>> On 05/14/10 10:07, Matt Fowles wrote:
>>>
>>> Ramki~
>>>
>>> The machine has 4 cpus each of which have 4 cores.  I will adjust the
>>
>> Great, thanks. I'd suggest make ParallelGCThreads=8. Also compare with
>> -XX:-UseParNewGC. if it's the kind of fragmentation that we
>> believe may be the cause here, you'd see larger gc times in the
>> latter case but they would not increase as they do now. But that
>> is conjecture at this point.
>>
>>> survivor spaces as you suggest.  Previously I had been running with
>>> MTT 0, but change it to 4 at the suggestion of others.
>>
>> MTT=0 can give very poor performance, as people said MTT=4
>> would definitely be better here than MTT=0.
>> You should use MTT=1 here though.
>>
>>>
>>> Running with the JDK7 version may take a bit of time, but I will
>>> pursue that as well.
>>
>> All you should do is pull the libjvm.so that is in the JDK 7 installation
>> (or bundle) and plonk it down into the appropriate directory of your
>> existing JDK 6u20 installation. We just want to see the results with
>> the latest JVM which includes a fix for 6631166.
>>
>> I attached a very rough plot of some metrics extracted from your log
>> and this behaviour is definitely deserving of a bug, especially
>> if it can be shown that it happens in the latest JVM. In the plot:
>>
>>  red: scavenge durations
>>  dark blue: promoted data per scavenge
>>  pink: data in survivor space following scavenge
>>  light blue: live data in old gen
>>
>> As you can see the scavenge clearly correlates with the
>> occupancy of the old gen (as Jon and others indicated).
>> Did you try Jon's suggestion of doing a manual GC at that
>> point via jconsole, and seeing if the upward trend of
>> scavenges continues beyond that?
>>
>> Did you use -XX:+UseLargePages and -XX:+AlwaysPreTouch?
>>
>> Do you have an easily used test case that you can share with us via
>> your support channels? If/when you do so, please copy me and
>> send them a reference to this thread on this mailing list.
>>
>> later, with your new data.
>> -- ramki
>>
>>>
>>> Matt
>>>
>>>
>>>
>>> On Fri, May 14, 2010 at 12:58 PM, Y. Srinivas Ramakrishna
>>> <y.s.ramakrishna at oracle.com> wrote:
>>>>
>>>> Hi Matt -- i am computing some metrics from yr log file
>>>> and would like to know how many cpu's you have for the logs below?
>>>>
>>>> Also, as you noted, almost anything that survives a scavenge
>>>> lives for a while. To reduce the overhead of unnecessary
>>>> back-and-forth copying in the survivor spaces, just use
>>>> MaxTenuringThreshold=1 (This suggestion was also made by
>>>> several others in the thread, and is corroborated by your
>>>> PrintTenuringDistribution data). Since you have farily large survivor
>>>> spaces configured now, (at least large enough to fit 4 age cohorts,
>>>> which will be down to 1 age cohort if you use MTT=1), i'd
>>>> suggest making your surviror spaces smaller, may be down to
>>>> about 64 MB from the current 420 MB each, and give the excess
>>>> to your Eden space.
>>>>
>>>> Then use 6u21 when it comes out (or ask your Java support to
>>>> send you a 6u21 for a beta test), or drop in a JVM from JDK 7 into
>>>> your 6u20 installation, and run with that. If you still see
>>>> rising pause times let me know or file a bug, and send us the
>>>> log file and JVM options along with full platform information.
>>>>
>>>> I'll run some metrics from yr log file if you send me the info
>>>> re platform above, and that may perhaps reveal a few more secrets.
>>>>
>>>> later.
>>>> -- ramki
>>>>
>>>> On 05/12/10 15:19, Matt Fowles wrote:
>>>>>
>>>>> All~
>>>>>
>>>>> I have a large app that produces ~4g of garbage every 30 seconds and
>>>>> am trying to reduce the size of gc outliers.  About 99% of this data
>>>>> is garbage, but almost anything that survives one collection survives
>>>>> for an indeterminately long amount of time.  We are currently using
>>>>> the following VM and options:
>>>>>
>>>>> java version "1.6.0_20"
>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>>>>>
>>>>>              -verbose:gc
>>>>>              -XX:+PrintGCTimeStamps
>>>>>              -XX:+PrintGCDetails
>>>>>              -XX:+PrintGCTaskTimeStamps
>>>>>              -XX:+PrintTenuringDistribution
>>>>>              -XX:+PrintCommandLineFlags
>>>>>              -XX:+PrintReferenceGC
>>>>>              -Xms32g -Xmx32g -Xmn4g
>>>>>              -XX:+UseParNewGC
>>>>>              -XX:ParallelGCThreads=4
>>>>>              -XX:+UseConcMarkSweepGC
>>>>>              -XX:ParallelCMSThreads=4
>>>>>              -XX:CMSInitiatingOccupancyFraction=60
>>>>>              -XX:+UseCMSInitiatingOccupancyOnly
>>>>>              -XX:+CMSParallelRemarkEnabled
>>>>>              -XX:MaxGCPauseMillis=50
>>>>>              -Xloggc:gc.log
>>>>>
>>>>>
>>>>> As you can see from the GC log, we never actually reach the point
>>>>> where the CMS kicks in (after app startup).  But our young gens seem
>>>>> to take increasingly long to collect as time goes by.
>>>>>
>>>>> The steady state of the app is reached around 956.392 into the log
>>>>> with a collection that takes 0.106 seconds.  Thereafter the survivor
>>>>> space remains roughly constantly as filled and the amount promoted to
>>>>> old gen also remains constant, but the collection times increase to
>>>>> 2.855 seconds by the end of the 3.5 hour run.
>>>>>
>>>>> Has anyone seen this sort of behavior before?  Are there more switches
>>>>> that I should try running with?
>>>>>
>>>>> Obviously, I am working to profile the app and reduce the garbage load
>>>>> in parallel.  But if I still see this sort of problem, it is only a
>>>>> question of how long must the app run before I see unacceptable
>>>>> latency spikes.
>>>>>
>>>>> Matt
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing list
>>>>> hotspot-gc-use at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: r229-vm-tweaks-gc2.log.bz2
Type: application/x-bzip2
Size: 30955 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100514/2a476a93/attachment-0001.bin