G1GC Full GCs

Tue Jul 6 14:24:27 PDT 2010

On Tue, Jul 6, 2010 at 2:09 PM, Jon Masamitsu <jon.masamitsu at oracle.com>wrote:

>  Todd,
>
> Could you send a segment of the GC logs from the beginning
> through the first dozen or so full GC's?
>

Sure, I just put it online at:

http://cloudera-todd.s3.amazonaws.com/gc-log-g1gc.txt

>
> Exactly which version of the JVM are you using?
>
>     java -version
>
> will tell us.
>
>
Latest as of last night:

[todd at monster01 ~]$ ./jdk1.7.0/jre/bin/java -version
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b99)
Java HotSpot(TM) 64-Bit Server VM (build 19.0-b03, mixed mode)

> Do you have a test setup where you could do some experiments?
>
>
Sure, I have a five node cluster here where I do lots of testing, happy to
try different builds/options/etc (though I probably don't have time to apply
patches and rebuild the JDK myself)

> Can you send the set of CMS flags you use?  It might tell
> us something about the GC behavior of you application.
> Might not tell us anything but it's worth a look.
>

Different customers have found different flags to work well for them. One
user uses the following:

-Xmx12000m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC \
-XX:NewSize=64m -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=40 \

Another uses:

-XX:+DoEscapeAnalysis -XX:+AggressiveOpts -XX:+UseConcMarkSweepGC
-XX:NewSize=64m -XX:MaxNewSize=64m
-XX:CMSInitiatingOccupancyFraction=88 -verbose:gc -XX:+PrintGCDetails

The particular tuning options probably depend on the actual cache workload
of the user. I tend to recommend CMSInitiatingOccupancyFraction around 75 or
so, since the software maintains about 60% heap usage. I also think a
NewSize slightly larger would improve things a bit, but if it gets more than
256m or so, the ParNew pauses start to be too long for a lot of use cases.

Regarding CMS logs, I can probably restart this test later this afternoon on
CMS and run it for a couple hours, but it isn't likely to hit the
multi-minute compaction that quickly. It happens more in the wild.

-Todd

>
> On 07/06/10 13:27, Todd Lipcon wrote:
>
> Hi all,
>
>  I work on HBase, a distributed database written in Java. We generally run
> on large heaps (8GB+), and our object lifetime distribution has proven
> pretty problematic for garbage collection (we manage a multi-GB LRU cache
> inside the process, so in CMS we tenure a lot of byte arrays which later get
> collected).
>
>  In Java6, we generally run with the CMS collector, tuning down
> CMSInitiatingOccupancyFraction and constraining MaxNewSize to achieve fairly
> low pause GC, but after a week or two of uptime we often run into full heap
> compaction which takes several minutes and wreaks havoc on the system.
>
>  Needless to say, we've been watching the development of the G1 GC with
> anticipation for the last year or two. Finally it seems in the latest build
> of JDK7 it's stable enough for actual use (previously it would segfault
> within an hour or so). However, in my testing I'm seeing a fair amount of
> 8-10 second Full GC pauses.
>
>  The flags I'm passing are:
>
>  -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> -XX:GCPauseIntervalMillis=80
>
>  Most of the pauses I see in the GC log are around 10-20ms as expected:
>
>  2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial), 0.01209000
> secs]
>    [Parallel Time:  10.5 ms]
>       [Update RS (Start) (ms):  1680080.2  1680080.1  1680080.2  1680079.9
>  1680080.0  1680080.2  1680080.1  1680080.1  1680080.0  1680080.1  1680080.0
>  1680079.9  1680081.5]
>       [Update RS (ms):  1.4  2.0  2.2  1.8  1.7  1.4  2.5  2.2  1.9  2.5
>  1.7  1.7  0.1
>        Avg:   1.8, Min:   0.1, Max:   2.5]
>          [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3
>           Sum: 52, Avg: 4, Min: 1, Max: 8]
>       [Ext Root Scanning (ms):  0.7  0.5  0.5  0.3  0.5  0.7  0.4  0.5  0.4
>  0.5  0.3  0.3  0.0
>        Avg:   0.4, Min:   0.0, Max:   0.7]
>       [Mark Stack Scanning (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
>  0.0  0.0  0.0  0.0  0.0
>        Avg:   0.0, Min:   0.0, Max:   0.0]
>       [Scan RS (ms):  0.9  0.4  0.1  0.7  0.7  0.9  0.0  0.1  0.6  0.0  0.8
>  0.8  0.9
>        Avg:   0.5, Min:   0.0, Max:   0.9]
>       [Object Copy (ms):  7.2  7.2  7.3  7.3  7.1  7.1  7.0  7.2  7.1  6.9
>  7.1  7.1  7.0
>        Avg:   7.1, Min:   6.9, Max:   7.3]
>       [Termination (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
>  0.0  0.0  0.0
>        Avg:   0.0, Min:   0.0, Max:   0.0]
>       [Other:   0.6 ms]
>    [Clear CT:   0.5 ms]
>    [Other:   1.1 ms]
>       [Choose CSet:   0.0 ms]
>    [ 7677M->7636M(8000M)]
>  [Times: user=0.12 sys=0.00, real=0.01 secs]
>
>  But every 5-10 minutes I see a GC pause that lasts 10-15 seconds:
>  [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail
> 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC 7934M->4865M(8000M),
> 9.8907800 secs]
> 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC 7930M->4964M(8000M),
> 9.9025520 secs]
> 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC 7934M->4882M(8000M),
> 10.1232190 secs]
> 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC 7938M->5002M(8000M),
> 10.4997760 secs]
> 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC 7938M->4962M(8000M),
> 11.0497380 secs]
>
>  These pauses are pretty unacceptable for soft real time operation.
>
>  Am I missing some tuning that should be done for G1GC for applications
> like this? Is 20ms out of 80ms too aggressive a target for the garbage rates
> we're generating?
>
>  My actual live heap usage should be very stable around 5GB - the
> application very carefully accounts its live object set at around 60% of the
> max heap (as you can see in the logs above).
>
>  At this point we are considering doing crazy things like ripping out our
> main memory consumers into a custom slab allocator, and manually reference
> count the byte array slices. But, if we can get G1GC to work for us, it will
> save a lot of engineering on the application side!
>
>  Thanks
> -Todd
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
> ------------------------------
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>

-- 
Todd Lipcon
Software Engineer, Cloudera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100706/b169796f/attachment.html