Added one more bit of debug output here.. whenever it comes across a region with only one live object, it also dumps the stats on the regions in the coarse rset. For example:<div><br></div><div><br><div>Region 0x00002aaab0c48388 (  M1) [0x00002aac7f000000, 0x00002aac7f400000] Used: 4096K, garbage: 4095K. Eff: 30.061317 K/ms</div>


<div>  Very low-occupancy low-efficiency region.</div><div>  RSet: coarse: 90112  fine: 0  sparse: 2717</div><div><br></div><div> num     #instances         #bytes  class name</div><div>----------------------------------------------</div>


<div>   1:             1             56  java.util.concurrent.locks.AbstractQueuedSynchronizer$Node</div><div>Total             1             56</div><div>Coarse region references:</div><div>--------------------------</div>


<div>Region 0x00002aaab04b5288 (  M1) [0x00002aab5b000000, 0x00002aab5b400000] Used: 4096K, garbage: 4094K. Eff: 243.385409 K/ms</div><div>Region 0x00002aaab04c2708 (  M1) [0x00002aab5d000000, 0x00002aab5d400000] Used: 4096K, garbage: 1975K. Eff: 366.659049 K/ms</div>


<div>Region 0x00002aaab054de48 (  M1) [0x00002aab72000000, 0x00002aab72400000] Used: 4096K, garbage: 4095K. Eff: 40.958295 K/ms</div><div>Region 0x00002aaab0622648 (  M1) [0x00002aab92000000, 0x00002aab92400000] Used: 4096K, garbage: 4042K. Eff: 40.304866 K/ms</div>


<div>Region 0x00002aaab0c30fa8 (  M1) [0x00002aac7b800000, 0x00002aac7bc00000] Used: 4096K, garbage: 4094K. Eff: 53.233756 K/ms</div><div>Region 0x00002aaab0c32a38 (  M1) [0x00002aac7bc00000, 0x00002aac7c000000] Used: 4096K, garbage: 4094K. Eff: 143.159938 K/ms</div>


<div>Region 0x00002aaab0c4b8a8 (  M1) [0x00002aac7f800000, 0x00002aac7fc00000] Used: 4096K, garbage: 4095K. Eff: 53.680457 K/ms</div><div>Region 0x00002aaab0c50858 (  M1) [0x00002aac80400000, 0x00002aac80800000] Used: 4096K, garbage: 4095K. Eff: 20.865626 K/ms</div>


<div>Region 0x00002aaab0c522e8 (  M1) [0x00002aac80800000, 0x00002aac80c00000] Used: 4096K, garbage: 4094K. Eff: 36.474733 K/ms</div><div>Region 0x00002aaab0c6b158 (  M1) [0x00002aac84400000, 0x00002aac84800000] Used: 4096K, garbage: 4095K. Eff: 19.686717 K/ms</div>


<div>Region 0x00002aaab0d74b58 (  M1) [0x00002aacac400000, 0x00002aacac800000] Used: 4096K, garbage: 4095K. Eff: 36.379891 K/ms</div><div>--------------------------</div><div><br></div><div>So here we have a region that has 11 coarse rset members, all of which are pretty low efficiency. So, they're not going to get collected, and neither will this region.</div>


<div><br></div><div>Basically we always devolve into a case where there are a bunch of these inefficient regions all referring to each other in the coarse rsets.</div><div><br></div><div>Would it be possible to add some phase which "breaks down" the coarse members back into sparse/fine? Note how in this case all of the fine references are gone. I imagine most of the coarse references are "ghosts" as well - once upon a time there was an object in those regions that referred to this region, but they're long since dead.</div>


<div><br></div><div class="gmail_quote">On Sun, Jan 23, 2011 at 10:16 PM, Todd Lipcon <span dir="ltr"><<a href="mailto:todd@cloudera.com">todd@cloudera.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


Unfortunately my test is not easy to reproduce in its current form. But as I look more and more into it, it looks like we're running into the same issue.<div><br></div><div>I added some code at the end of the mark phase that, after it sorts the regions by efficiency, will print an object histogram for any regions that are >98% garbage but very inefficient (<100KB/ms predicted collection rate)</div>


<div><br></div><div>Here's an example of an "uncollectable" region that is all garbage but for one object:</div><div><br></div><div><div>Region 0x00002aaab0203e18 (  M1) [0x00002aaaf3800000, 0x00002aaaf3c00000] Used: 4096K, garbage: 4095K. Eff: 6.448103 K/ms</div>


<div>  Very low-occupancy low-efficiency region. Histogram:</div><div><br></div><div> num     #instances         #bytes  class name</div><div>----------------------------------------------</div><div>   1:             1            280  [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;</div>


<div>Total             1            280</div></div><div><br></div><div>At 6K/ms it's predicting take 600+ms to collect this region, so it will never happen.</div><div><br></div><div>I can't think of any way that there would be a high mutation rate of references to this Entry object..</div>


<div><br></div><div>So, my shot-in-the-dark theory is similar to what Peter was thinking. When a region through its lifetime has a large number of other regions reference it, even briefly, its sparse table will overflow. Then, later in life when it's down to even just one object with a very small number of inbound references, it still has all of those coarse entries -- they don't get scrubbed because those regions are suffering the same issue.</div>


<div><br></div><div>Thoughts?</div><div><br></div><font color="#888888"><div>-Todd</div></font><div><div><div></div><div class="h5"><br><div class="gmail_quote">On Sun, Jan 23, 2011 at 12:42 AM, Peter Schuller <span dir="ltr"><<a href="mailto:peter.schuller@infidyne.com" target="_blank">peter.schuller@infidyne.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>> I still seem to be putting off GC of non-young regions too much though. I<br>

<br>

</div>Part of my experiments I have been harping on was the below change to<br>

cut GC efficiency out of the decision to perform non-young<br>

collections. I'm not suggesting it actually be disabled, but perhaps<br>

it can be adjusted to fit your workload? If there is nothing outright<br>

wrong in terms of predictions and the problem is due to cost estimates<br>

being too high, that may be a way to avoid full GC:s at the expense of<br>

more expensive GC activity. This smells like something that should be<br>

a tweakable VM option. Just like GCTimeRatio affects heap expansion<br>

decisions, something to affect this (probably just a ratio applied to<br>

the test below?).<br>

<br>

Another thing: This is to a large part my human confirmation biased<br>

brain speaking, but I would be really interested to find out if if the<br>

slow build-up you seem to be experiencing is indeed due to rs scan<br>

costs de to sparse table overflow (I've been harping about roughly the<br>

same thing several times so maybe people are tired of it; most<br>

recently in the thread "g1: dealing with high rates of inter-region<br>

pointer writes").<br>

<br>

Is your test easily runnable so that one can reproduce? Preferably<br>

without lots of hbase/hadoop knowledge. I.e., is it something that can<br>

be run in a self-contained fashion fairly easily?<br>

<br>

Here's the patch indicating where to adjust the efficiency thresholding:<br>

<br>

--- a/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp   Fri<br>

Dec 17 23:32:58 2010 -0800<br>

+++ b/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp   Sun<br>

Jan 23 09:21:54 2011 +0100<br>

@@ -1463,7 +1463,7 @@<br>

     if ( !_last_young_gc_full ) {<br>

       if ( _should_revert_to_full_young_gcs ||<br>

            _known_garbage_ratio < 0.05 ||<br>

<div>-           (adaptive_young_list_length() &&<br>

+           (adaptive_young_list_length() && //false && // scodetodo<br>

</div>            (get_gc_eff_factor() * cur_efficiency < predict_young_gc_eff())) ) {<br>

         set_full_young_gcs(true);<br>

       }<br>

<br>

<br>

--<br>

<font color="#888888">/ Peter Schuller<br>

</font></blockquote></div><br><br clear="all"><br></div></div><div class="im">-- <br>Todd Lipcon<br>Software Engineer, Cloudera<br>

</div></div>

</blockquote></div><br><br clear="all"><br>-- <br>Todd Lipcon<br>Software Engineer, Cloudera<br>

</div>