G1GC Full GCs
Todd Lipcon
todd at cloudera.com
Tue Jul 6 17:41:14 PDT 2010
On Tue, Jul 6, 2010 at 4:57 PM, Y. S. Ramakrishna <
y.s.ramakrishna at oracle.com> wrote:
> Two questions:
>
> (1) do you enable class unloading with CMS? (I do not see that
> below in yr option list, but wondered.)
>
>
We don't enable class unloading.
> (2) does your application load classes, or intern a lot of strings?
>
> We do a little bit of class loading at startup - we have some basic
informational web UIs that are in Jetty servlets, and I think Jetty uses its
own ClassLoader. It's nothing fancy, though, and nothing should get
unloaded/reloaded. Perm usually stabilizes around 30MB if I recall correctly
when I use CMS.
> If i am reading the logs right, G1 appears to reclaim less
> and less of the heap in each cycle until a full collection
> intervenes, and I have no real explanation for this behaviour
> except that perhaps there's something in the perm gen that
> keeps stuff in the remainder of the heap artificially live.
> G1 does not incrementally collect the young gen, so this is
> plausible. But CMS does not either by default and I do not see
> that option in the CMS options list you gave below.
We don't enable permgen collection with CMS - the options I gave were from
production instances.
> It would
> be instructive to see what the comparable CMS logs look like.
> May be then you could start with the same heap shapes for the
> two and see if you can get to the bottom of the full gc (which
> as i understand you get to more quickly w/G1 than you did
> w/CMS).
>
Yep, I'll kick off some CMS tests this evening and get back to you with
those logs.
Thanks for the help, all.
-Todd
>
> On 07/06/10 14:24, Todd Lipcon wrote:
>
>> On Tue, Jul 6, 2010 at 2:09 PM, Jon Masamitsu <jon.masamitsu at oracle.com<mailto:
>> jon.masamitsu at oracle.com>> wrote:
>>
>> Todd,
>>
>> Could you send a segment of the GC logs from the beginning
>> through the first dozen or so full GC's?
>>
>>
>> Sure, I just put it online at:
>>
>> http://cloudera-todd.s3.amazonaws.com/gc-log-g1gc.txt
>>
>>
>> Exactly which version of the JVM are you using?
>>
>> java -version
>>
>> will tell us.
>>
>>
>> Latest as of last night:
>>
>> [todd at monster01 ~]$ ./jdk1.7.0/jre/bin/java -version
>> java version "1.7.0-ea"
>> Java(TM) SE Runtime Environment (build 1.7.0-ea-b99)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.0-b03, mixed mode)
>>
>> Do you have a test setup where you could do some experiments?
>>
>>
>> Sure, I have a five node cluster here where I do lots of testing, happy to
>> try different builds/options/etc (though I probably don't have time to apply
>> patches and rebuild the JDK myself)
>>
>> Can you send the set of CMS flags you use? It might tell
>> us something about the GC behavior of you application.
>> Might not tell us anything but it's worth a look.
>>
>>
>> Different customers have found different flags to work well for them. One
>> user uses the following:
>>
>>
>> -Xmx12000m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC \
>> -XX:NewSize=64m -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=40 \
>>
>>
>> Another uses:
>>
>>
>> -XX:+DoEscapeAnalysis -XX:+AggressiveOpts -XX:+UseConcMarkSweepGC
>> -XX:NewSize=64m -XX:MaxNewSize=64m
>> -XX:CMSInitiatingOccupancyFraction=88 -verbose:gc -XX:+PrintGCDetails
>>
>>
>>
>>
>>
>>
>>
>> The particular tuning options probably depend on the actual cache workload
>> of the user. I tend to recommend CMSInitiatingOccupancyFraction around 75 or
>> so, since the software maintains about 60% heap usage. I also think a
>> NewSize slightly larger would improve things a bit, but if it gets more than
>> 256m or so, the ParNew pauses start to be too long for a lot of use cases.
>>
>> Regarding CMS logs, I can probably restart this test later this afternoon
>> on CMS and run it for a couple hours, but it isn't likely to hit the
>> multi-minute compaction that quickly. It happens more in the wild.
>>
>> -Todd
>>
>>
>>
>> On 07/06/10 13:27, Todd Lipcon wrote:
>>
>>> Hi all,
>>>
>>> I work on HBase, a distributed database written in Java. We
>>> generally run on large heaps (8GB+), and our object lifetime
>>> distribution has proven pretty problematic for garbage collection
>>> (we manage a multi-GB LRU cache inside the process, so in CMS we
>>> tenure a lot of byte arrays which later get collected).
>>>
>>> In Java6, we generally run with the CMS collector, tuning down
>>> CMSInitiatingOccupancyFraction and constraining MaxNewSize to
>>> achieve fairly low pause GC, but after a week or two of uptime we
>>> often run into full heap compaction which takes several minutes
>>> and wreaks havoc on the system.
>>>
>>> Needless to say, we've been watching the development of the G1 GC
>>> with anticipation for the last year or two. Finally it seems in
>>> the latest build of JDK7 it's stable enough for actual use
>>> (previously it would segfault within an hour or so). However, in
>>> my testing I'm seeing a fair amount of 8-10 second Full GC pauses.
>>>
>>> The flags I'm passing are:
>>>
>>> -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20
>>> -XX:GCPauseIntervalMillis=80
>>>
>>> Most of the pauses I see in the GC log are around 10-20ms as expected:
>>>
>>> 2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial),
>>> 0.01209000 secs]
>>> [Parallel Time: 10.5 ms]
>>> [Update RS (Start) (ms): 1680080.2 1680080.1 1680080.2
>>> 1680079.9 1680080.0 1680080.2 1680080.1 1680080.1 1680080.0
>>> 1680080.1 1680080.0 1680079.9 1680081.5]
>>> [Update RS (ms): 1.4 2.0 2.2 1.8 1.7 1.4 2.5 2.2
>>> 1.9 2.5 1.7 1.7 0.1
>>> Avg: 1.8, Min: 0.1, Max: 2.5]
>>> [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3
>>> Sum: 52, Avg: 4, Min: 1, Max: 8]
>>> [Ext Root Scanning (ms): 0.7 0.5 0.5 0.3 0.5 0.7 0.4
>>> 0.5 0.4 0.5 0.3 0.3 0.0
>>> Avg: 0.4, Min: 0.0, Max: 0.7]
>>> [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0
>>> 0.0 0.0 0.0 0.0 0.0 0.0 0.0
>>> Avg: 0.0, Min: 0.0, Max: 0.0]
>>> [Scan RS (ms): 0.9 0.4 0.1 0.7 0.7 0.9 0.0 0.1 0.6
>>> 0.0 0.8 0.8 0.9
>>> Avg: 0.5, Min: 0.0, Max: 0.9]
>>> [Object Copy (ms): 7.2 7.2 7.3 7.3 7.1 7.1 7.0 7.2
>>> 7.1 6.9 7.1 7.1 7.0
>>> Avg: 7.1, Min: 6.9, Max: 7.3]
>>> [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
>>> 0.0 0.0 0.0 0.0 0.0
>>> Avg: 0.0, Min: 0.0, Max: 0.0]
>>> [Other: 0.6 ms]
>>> [Clear CT: 0.5 ms]
>>> [Other: 1.1 ms]
>>> [Choose CSet: 0.0 ms]
>>> [ 7677M->7636M(8000M)]
>>> [Times: user=0.12 sys=0.00, real=0.01 secs]
>>>
>>> But every 5-10 minutes I see a GC pause that lasts 10-15 seconds:
>>> [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail
>>> 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC
>>> 7934M->4865M(8000M), 9.8907800 secs]
>>> 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC
>>> 7930M->4964M(8000M), 9.9025520 secs]
>>> 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC
>>> 7934M->4882M(8000M), 10.1232190 secs]
>>> 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC
>>> 7938M->5002M(8000M), 10.4997760 secs]
>>> 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC
>>> 7938M->4962M(8000M), 11.0497380 secs]
>>>
>>> These pauses are pretty unacceptable for soft real time operation.
>>>
>>> Am I missing some tuning that should be done for G1GC for
>>> applications like this? Is 20ms out of 80ms too aggressive a
>>> target for the garbage rates we're generating?
>>>
>>> My actual live heap usage should be very stable around 5GB - the
>>> application very carefully accounts its live object set at around
>>> 60% of the max heap (as you can see in the logs above).
>>>
>>> At this point we are considering doing crazy things like ripping
>>> out our main memory consumers into a custom slab allocator, and
>>> manually reference count the byte array slices. But, if we can get
>>> G1GC to work for us, it will save a lot of engineering on the
>>> application side!
>>>
>>> Thanks
>>> -Todd
>>>
>>> -- Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net <mailto:
>>> hotspot-gc-use at openjdk.java.net>
>>>
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100706/595fba5a/attachment.html
More information about the hotspot-gc-use
mailing list