Added one more bit of debug output here.. whenever it comes across a region with only one live object, it also dumps the stats on the regions in the coarse rset. For example:<div><br></div><div><br><div>Region 0x00002aaab0c48388 ( M1) [0x00002aac7f000000, 0x00002aac7f400000] Used: 4096K, garbage: 4095K. Eff: 30.061317 K/ms</div>
<div> Very low-occupancy low-efficiency region.</div><div> RSet: coarse: 90112 fine: 0 sparse: 2717</div><div><br></div><div> num #instances #bytes class name</div><div>----------------------------------------------</div>
<div> 1: 1 56 java.util.concurrent.locks.AbstractQueuedSynchronizer$Node</div><div>Total 1 56</div><div>Coarse region references:</div><div>--------------------------</div>
<div>Region 0x00002aaab04b5288 ( M1) [0x00002aab5b000000, 0x00002aab5b400000] Used: 4096K, garbage: 4094K. Eff: 243.385409 K/ms</div><div>Region 0x00002aaab04c2708 ( M1) [0x00002aab5d000000, 0x00002aab5d400000] Used: 4096K, garbage: 1975K. Eff: 366.659049 K/ms</div>
<div>Region 0x00002aaab054de48 ( M1) [0x00002aab72000000, 0x00002aab72400000] Used: 4096K, garbage: 4095K. Eff: 40.958295 K/ms</div><div>Region 0x00002aaab0622648 ( M1) [0x00002aab92000000, 0x00002aab92400000] Used: 4096K, garbage: 4042K. Eff: 40.304866 K/ms</div>
<div>Region 0x00002aaab0c30fa8 ( M1) [0x00002aac7b800000, 0x00002aac7bc00000] Used: 4096K, garbage: 4094K. Eff: 53.233756 K/ms</div><div>Region 0x00002aaab0c32a38 ( M1) [0x00002aac7bc00000, 0x00002aac7c000000] Used: 4096K, garbage: 4094K. Eff: 143.159938 K/ms</div>
<div>Region 0x00002aaab0c4b8a8 ( M1) [0x00002aac7f800000, 0x00002aac7fc00000] Used: 4096K, garbage: 4095K. Eff: 53.680457 K/ms</div><div>Region 0x00002aaab0c50858 ( M1) [0x00002aac80400000, 0x00002aac80800000] Used: 4096K, garbage: 4095K. Eff: 20.865626 K/ms</div>
<div>Region 0x00002aaab0c522e8 ( M1) [0x00002aac80800000, 0x00002aac80c00000] Used: 4096K, garbage: 4094K. Eff: 36.474733 K/ms</div><div>Region 0x00002aaab0c6b158 ( M1) [0x00002aac84400000, 0x00002aac84800000] Used: 4096K, garbage: 4095K. Eff: 19.686717 K/ms</div>
<div>Region 0x00002aaab0d74b58 ( M1) [0x00002aacac400000, 0x00002aacac800000] Used: 4096K, garbage: 4095K. Eff: 36.379891 K/ms</div><div>--------------------------</div><div><br></div><div>So here we have a region that has 11 coarse rset members, all of which are pretty low efficiency. So, they're not going to get collected, and neither will this region.</div>
<div><br></div><div>Basically we always devolve into a case where there are a bunch of these inefficient regions all referring to each other in the coarse rsets.</div><div><br></div><div>Would it be possible to add some phase which "breaks down" the coarse members back into sparse/fine? Note how in this case all of the fine references are gone. I imagine most of the coarse references are "ghosts" as well - once upon a time there was an object in those regions that referred to this region, but they're long since dead.</div>
<div><br></div><div class="gmail_quote">On Sun, Jan 23, 2011 at 10:16 PM, Todd Lipcon <span dir="ltr"><<a href="mailto:todd@cloudera.com">todd@cloudera.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Unfortunately my test is not easy to reproduce in its current form. But as I look more and more into it, it looks like we're running into the same issue.<div><br></div><div>I added some code at the end of the mark phase that, after it sorts the regions by efficiency, will print an object histogram for any regions that are >98% garbage but very inefficient (<100KB/ms predicted collection rate)</div>
<div><br></div><div>Here's an example of an "uncollectable" region that is all garbage but for one object:</div><div><br></div><div><div>Region 0x00002aaab0203e18 ( M1) [0x00002aaaf3800000, 0x00002aaaf3c00000] Used: 4096K, garbage: 4095K. Eff: 6.448103 K/ms</div>
<div> Very low-occupancy low-efficiency region. Histogram:</div><div><br></div><div> num #instances #bytes class name</div><div>----------------------------------------------</div><div> 1: 1 280 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;</div>
<div>Total 1 280</div></div><div><br></div><div>At 6K/ms it's predicting take 600+ms to collect this region, so it will never happen.</div><div><br></div><div>I can't think of any way that there would be a high mutation rate of references to this Entry object..</div>
<div><br></div><div>So, my shot-in-the-dark theory is similar to what Peter was thinking. When a region through its lifetime has a large number of other regions reference it, even briefly, its sparse table will overflow. Then, later in life when it's down to even just one object with a very small number of inbound references, it still has all of those coarse entries -- they don't get scrubbed because those regions are suffering the same issue.</div>
<div><br></div><div>Thoughts?</div><div><br></div><font color="#888888"><div>-Todd</div></font><div><div><div></div><div class="h5"><br><div class="gmail_quote">On Sun, Jan 23, 2011 at 12:42 AM, Peter Schuller <span dir="ltr"><<a href="mailto:peter.schuller@infidyne.com" target="_blank">peter.schuller@infidyne.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>> I still seem to be putting off GC of non-young regions too much though. I<br>
<br>
</div>Part of my experiments I have been harping on was the below change to<br>
cut GC efficiency out of the decision to perform non-young<br>
collections. I'm not suggesting it actually be disabled, but perhaps<br>
it can be adjusted to fit your workload? If there is nothing outright<br>
wrong in terms of predictions and the problem is due to cost estimates<br>
being too high, that may be a way to avoid full GC:s at the expense of<br>
more expensive GC activity. This smells like something that should be<br>
a tweakable VM option. Just like GCTimeRatio affects heap expansion<br>
decisions, something to affect this (probably just a ratio applied to<br>
the test below?).<br>
<br>
Another thing: This is to a large part my human confirmation biased<br>
brain speaking, but I would be really interested to find out if if the<br>
slow build-up you seem to be experiencing is indeed due to rs scan<br>
costs de to sparse table overflow (I've been harping about roughly the<br>
same thing several times so maybe people are tired of it; most<br>
recently in the thread "g1: dealing with high rates of inter-region<br>
pointer writes").<br>
<br>
Is your test easily runnable so that one can reproduce? Preferably<br>
without lots of hbase/hadoop knowledge. I.e., is it something that can<br>
be run in a self-contained fashion fairly easily?<br>
<br>
Here's the patch indicating where to adjust the efficiency thresholding:<br>
<br>
--- a/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp Fri<br>
Dec 17 23:32:58 2010 -0800<br>
+++ b/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp Sun<br>
Jan 23 09:21:54 2011 +0100<br>
@@ -1463,7 +1463,7 @@<br>
if ( !_last_young_gc_full ) {<br>
if ( _should_revert_to_full_young_gcs ||<br>
_known_garbage_ratio < 0.05 ||<br>
<div>- (adaptive_young_list_length() &&<br>
+ (adaptive_young_list_length() && //false && // scodetodo<br>
</div> (get_gc_eff_factor() * cur_efficiency < predict_young_gc_eff())) ) {<br>
set_full_young_gcs(true);<br>
}<br>
<br>
<br>
--<br>
<font color="#888888">/ Peter Schuller<br>
</font></blockquote></div><br><br clear="all"><br></div></div><div class="im">-- <br>Todd Lipcon<br>Software Engineer, Cloudera<br>
</div></div>
</blockquote></div><br><br clear="all"><br>-- <br>Todd Lipcon<br>Software Engineer, Cloudera<br>
</div>