G1GC Full GCs

Tue Jul 6 21:40:28 UTC 2010

I also work with Todd on this systems (I am one of the other people
with the alternate CMS config) and doubling the heap size from 8 GB to
16GB is a little insane... we'd like to have some amount of reasonable
memory efficiency here... The thing is the more we can get out of our
ram for this block cache, the better performing our systems are.  Also
a lot of the settings are self tuning, so if we up the Xmx the size of
the block cache is scaled as well.

-ryan

On Tue, Jul 6, 2010 at 2:12 PM, Y. S. Ramakrishna
<y.s.ramakrishna at oracle.com> wrote:
> Did you try doubling the heap size? You might want to post a full
> log so we can see what's happening between those full collections.
> Also, If you have comparable CMS logs
> all the better, as a known starting point. The full gc's almost
> look like the heap got too full, so it must mean that incremental
> collection is not keeping up with the rate of garbage generation.
> Also, what's the JDK build you are running?
>
> -- ramki
>
> On 07/06/10 13:27, Todd Lipcon wrote:
>> Hi all,
>>
>> I work on HBase, a distributed database written in Java. We generally
>> run on large heaps (8GB+), and our object lifetime distribution has
>> proven pretty problematic for garbage collection (we manage a multi-GB
>> LRU cache inside the process, so in CMS we tenure a lot of byte arrays
>> which later get collected).
>>
>> In Java6, we generally run with the CMS collector, tuning down
>> CMSInitiatingOccupancyFraction and constraining MaxNewSize to achieve
>> fairly low pause GC, but after a week or two of uptime we often run into
>> full heap compaction which takes several minutes and wreaks havoc on the
>> system.
>>
>> Needless to say, we've been watching the development of the G1 GC with
>> anticipation for the last year or two. Finally it seems in the latest
>> build of JDK7 it's stable enough for actual use (previously it would
>> segfault within an hour or so). However, in my testing I'm seeing a fair
>> amount of 8-10 second Full GC pauses.
>>
>> The flags I'm passing are:
>>
>> -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:GCPauseIntervalMillis=80
>>
>> Most of the pauses I see in the GC log are around 10-20ms as expected:
>>
>> 2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial), 0.01209000
>> secs]
>>    [Parallel Time:  10.5 ms]
>>       [Update RS (Start) (ms):  1680080.2  1680080.1  1680080.2
>>  1680079.9  1680080.0  1680080.2  1680080.1  1680080.1  1680080.0
>>  1680080.1  1680080.0  1680079.9  1680081.5]
>>       [Update RS (ms):  1.4  2.0  2.2  1.8  1.7  1.4  2.5  2.2  1.9  2.5
>>  1.7  1.7  0.1
>>        Avg:   1.8, Min:   0.1, Max:   2.5]
>>          [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3
>>           Sum: 52, Avg: 4, Min: 1, Max: 8]
>>       [Ext Root Scanning (ms):  0.7  0.5  0.5  0.3  0.5  0.7  0.4  0.5
>>  0.4  0.5  0.3  0.3  0.0
>>        Avg:   0.4, Min:   0.0, Max:   0.7]
>>       [Mark Stack Scanning (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
>>  0.0  0.0  0.0  0.0  0.0
>>        Avg:   0.0, Min:   0.0, Max:   0.0]
>>       [Scan RS (ms):  0.9  0.4  0.1  0.7  0.7  0.9  0.0  0.1  0.6  0.0
>>  0.8  0.8  0.9
>>        Avg:   0.5, Min:   0.0, Max:   0.9]
>>       [Object Copy (ms):  7.2  7.2  7.3  7.3  7.1  7.1  7.0  7.2  7.1
>>  6.9  7.1  7.1  7.0
>>        Avg:   7.1, Min:   6.9, Max:   7.3]
>>       [Termination (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
>>  0.0  0.0  0.0  0.0
>>        Avg:   0.0, Min:   0.0, Max:   0.0]
>>       [Other:   0.6 ms]
>>    [Clear CT:   0.5 ms]
>>    [Other:   1.1 ms]
>>       [Choose CSet:   0.0 ms]
>>    [ 7677M->7636M(8000M)]
>>  [Times: user=0.12 sys=0.00, real=0.01 secs]
>>
>> But every 5-10 minutes I see a GC pause that lasts 10-15 seconds:
>> [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail
>> 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC 7934M->4865M(8000M),
>> 9.8907800 secs]
>> 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC 7930M->4964M(8000M),
>> 9.9025520 secs]
>> 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC 7934M->4882M(8000M),
>> 10.1232190 secs]
>> 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC 7938M->5002M(8000M),
>> 10.4997760 secs]
>> 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC 7938M->4962M(8000M),
>> 11.0497380 secs]
>>
>> These pauses are pretty unacceptable for soft real time operation.
>>
>> Am I missing some tuning that should be done for G1GC for applications
>> like this? Is 20ms out of 80ms too aggressive a target for the garbage
>> rates we're generating?
>>
>> My actual live heap usage should be very stable around 5GB - the
>> application very carefully accounts its live object set at around 60% of
>> the max heap (as you can see in the logs above).
>>
>> At this point we are considering doing crazy things like ripping out our
>> main memory consumers into a custom slab allocator, and manually
>> reference count the byte array slices. But, if we can get G1GC to work
>> for us, it will save a lot of engineering on the application side!
>>
>> Thanks
>> -Todd
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use