<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Todd,<br>
<br>
Could you send a segment of the GC logs from the beginning<br>
through the first dozen or so full GC's?<br>
<br>
Exactly which version of the JVM are you using?<br>
<br>
java -version<br>
<br>
will tell us.<br>
<br>
Do you have a test setup where you could do some experiments?<br>
<br>
Can you send the set of CMS flags you use? It might tell<br>
us something about the GC behavior of you application.<br>
Might not tell us anything but it's worth a look.<br>
<br>
Jon<br>
<br>
On 07/06/10 13:27, Todd Lipcon wrote:
<blockquote
cite="mid:AANLkTiniKm1ghON3AIl5JPNzOVqyasTxxQaRsGCPBnba@mail.gmail.com"
type="cite">Hi all,
<div><br>
</div>
<div>I work on HBase, a distributed database written in Java. We
generally run on large heaps (8GB+), and our object lifetime
distribution has proven pretty problematic for garbage collection (we
manage a multi-GB LRU cache inside the process, so in CMS we tenure a
lot of byte arrays which later get collected).</div>
<div><br>
</div>
<div>In Java6, we generally run with the CMS collector, tuning down
CMSInitiatingOccupancyFraction and constraining MaxNewSize to achieve
fairly low pause GC, but after a week or two of uptime we often run
into full heap compaction which takes several minutes and wreaks havoc
on the system.</div>
<div><br>
</div>
<div>Needless to say, we've been watching the development of the G1
GC with anticipation for the last year or two. Finally it seems in the
latest build of JDK7 it's stable enough for actual use (previously it
would segfault within an hour or so). However, in my testing I'm seeing
a fair amount of 8-10 second Full GC pauses.</div>
<div><br>
</div>
<div>The flags I'm passing are:</div>
<div><br>
</div>
<div>-Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:GCPauseIntervalMillis=80</div>
<div><br>
</div>
<div>Most of the pauses I see in the GC log are around 10-20ms as
expected:</div>
<div><br>
</div>
<div>
<div>2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial),
0.01209000 secs]</div>
<div> [Parallel Time: 10.5 ms]</div>
<div> [Update RS (Start) (ms): 1680080.2 1680080.1 1680080.2
1680079.9 1680080.0 1680080.2 1680080.1 1680080.1 1680080.0
1680080.1 1680080.0 1680079.9 1680081.5]</div>
<div> [Update RS (ms): 1.4 2.0 2.2 1.8 1.7 1.4 2.5 2.2
1.9 2.5 1.7 1.7 0.1</div>
<div> Avg: 1.8, Min: 0.1, Max: 2.5]</div>
<div> [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3</div>
<div> Sum: 52, Avg: 4, Min: 1, Max: 8]</div>
<div> [Ext Root Scanning (ms): 0.7 0.5 0.5 0.3 0.5 0.7
0.4 0.5 0.4 0.5 0.3 0.3 0.0</div>
<div> Avg: 0.4, Min: 0.0, Max: 0.7]</div>
<div> [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0</div>
<div> Avg: 0.0, Min: 0.0, Max: 0.0]</div>
<div> [Scan RS (ms): 0.9 0.4 0.1 0.7 0.7 0.9 0.0 0.1
0.6 0.0 0.8 0.8 0.9</div>
<div> Avg: 0.5, Min: 0.0, Max: 0.9]</div>
<div> [Object Copy (ms): 7.2 7.2 7.3 7.3 7.1 7.1 7.0 7.2
7.1 6.9 7.1 7.1 7.0</div>
<div> Avg: 7.1, Min: 6.9, Max: 7.3]</div>
<div> [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0</div>
<div> Avg: 0.0, Min: 0.0, Max: 0.0]</div>
<div> [Other: 0.6 ms]</div>
<div> [Clear CT: 0.5 ms]</div>
<div> [Other: 1.1 ms]</div>
<div> [Choose CSet: 0.0 ms]</div>
<div> [ 7677M->7636M(8000M)]</div>
<div> [Times: user=0.12 sys=0.00, real=0.01 secs]</div>
<div><br>
</div>
<div>But every 5-10 minutes I see a GC pause that lasts 10-15 seconds:</div>
<div>
<div>[todd@monster01 logs]$ grep 'Full GC' gc-hbase.log | tail</div>
<div>2010-07-06T12:50:41.216-0700: 52521.446: [Full GC
7934M->4865M(8000M), 9.8907800 secs]</div>
<div>2010-07-06T12:55:39.802-0700: 52820.032: [Full GC
7930M->4964M(8000M), 9.9025520 secs]</div>
<div>2010-07-06T13:02:26.872-0700: 53227.102: [Full GC
7934M->4882M(8000M), 10.1232190 secs]</div>
<div>2010-07-06T13:09:41.049-0700: 53661.279: [Full GC
7938M->5002M(8000M), 10.4997760 secs]</div>
<div>2010-07-06T13:18:51.531-0700: 54211.761: [Full GC
7938M->4962M(8000M), 11.0497380 secs]</div>
<div><br>
</div>
</div>
<div>These pauses are pretty unacceptable for soft real time
operation.</div>
<div><br>
</div>
<div>Am I missing some tuning that should be done for G1GC for
applications like this? Is 20ms out of 80ms too aggressive a target for
the garbage rates we're generating?</div>
<div><br>
</div>
<div>My actual live heap usage should be very stable around 5GB - the
application very carefully accounts its live object set at around 60%
of the max heap (as you can see in the logs above).</div>
<div><br>
</div>
<div>At this point we are considering doing crazy things like ripping
out our main memory consumers into a custom slab allocator, and
manually reference count the byte array slices. But, if we can get G1GC
to work for us, it will save a lot of engineering on the application
side!</div>
<div><br>
</div>
Thanks</div>
<div>-Todd</div>
<div><br>
-- <br>
Todd Lipcon<br>
Software Engineer, Cloudera<br>
</div>
<pre wrap="">
<hr size="4" width="90%">
_______________________________________________
hotspot-gc-use mailing list
<a class="moz-txt-link-abbreviated" href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a>
<a class="moz-txt-link-freetext" href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use</a>
</pre>
</blockquote>
</body>
</html>