G1GC Full GCs

Mon Jan 24 08:24:55 UTC 2011

Added one more bit of debug output here.. whenever it comes across a region
with only one live object, it also dumps the stats on the regions in the
coarse rset. For example:

Region 0x00002aaab0c48388 (  M1) [0x00002aac7f000000, 0x00002aac7f400000]
Used: 4096K, garbage: 4095K. Eff: 30.061317 K/ms
  Very low-occupancy low-efficiency region.
  RSet: coarse: 90112  fine: 0  sparse: 2717

 num     #instances         #bytes  class name
----------------------------------------------
   1:             1             56
 java.util.concurrent.locks.AbstractQueuedSynchronizer$Node
Total             1             56
Coarse region references:
--------------------------
Region 0x00002aaab04b5288 (  M1) [0x00002aab5b000000, 0x00002aab5b400000]
Used: 4096K, garbage: 4094K. Eff: 243.385409 K/ms
Region 0x00002aaab04c2708 (  M1) [0x00002aab5d000000, 0x00002aab5d400000]
Used: 4096K, garbage: 1975K. Eff: 366.659049 K/ms
Region 0x00002aaab054de48 (  M1) [0x00002aab72000000, 0x00002aab72400000]
Used: 4096K, garbage: 4095K. Eff: 40.958295 K/ms
Region 0x00002aaab0622648 (  M1) [0x00002aab92000000, 0x00002aab92400000]
Used: 4096K, garbage: 4042K. Eff: 40.304866 K/ms
Region 0x00002aaab0c30fa8 (  M1) [0x00002aac7b800000, 0x00002aac7bc00000]
Used: 4096K, garbage: 4094K. Eff: 53.233756 K/ms
Region 0x00002aaab0c32a38 (  M1) [0x00002aac7bc00000, 0x00002aac7c000000]
Used: 4096K, garbage: 4094K. Eff: 143.159938 K/ms
Region 0x00002aaab0c4b8a8 (  M1) [0x00002aac7f800000, 0x00002aac7fc00000]
Used: 4096K, garbage: 4095K. Eff: 53.680457 K/ms
Region 0x00002aaab0c50858 (  M1) [0x00002aac80400000, 0x00002aac80800000]
Used: 4096K, garbage: 4095K. Eff: 20.865626 K/ms
Region 0x00002aaab0c522e8 (  M1) [0x00002aac80800000, 0x00002aac80c00000]
Used: 4096K, garbage: 4094K. Eff: 36.474733 K/ms
Region 0x00002aaab0c6b158 (  M1) [0x00002aac84400000, 0x00002aac84800000]
Used: 4096K, garbage: 4095K. Eff: 19.686717 K/ms
Region 0x00002aaab0d74b58 (  M1) [0x00002aacac400000, 0x00002aacac800000]
Used: 4096K, garbage: 4095K. Eff: 36.379891 K/ms
--------------------------

So here we have a region that has 11 coarse rset members, all of which are
pretty low efficiency. So, they're not going to get collected, and neither
will this region.

Basically we always devolve into a case where there are a bunch of these
inefficient regions all referring to each other in the coarse rsets.

Would it be possible to add some phase which "breaks down" the coarse
members back into sparse/fine? Note how in this case all of the fine
references are gone. I imagine most of the coarse references are "ghosts" as
well - once upon a time there was an object in those regions that referred
to this region, but they're long since dead.

On Sun, Jan 23, 2011 at 10:16 PM, Todd Lipcon <todd at cloudera.com> wrote:

> Unfortunately my test is not easy to reproduce in its current form. But as
> I look more and more into it, it looks like we're running into the same
> issue.
>
> I added some code at the end of the mark phase that, after it sorts the
> regions by efficiency, will print an object histogram for any regions that
> are >98% garbage but very inefficient (<100KB/ms predicted collection rate)
>
> Here's an example of an "uncollectable" region that is all garbage but for
> one object:
>
> Region 0x00002aaab0203e18 (  M1) [0x00002aaaf3800000, 0x00002aaaf3c00000]
> Used: 4096K, garbage: 4095K. Eff: 6.448103 K/ms
>   Very low-occupancy low-efficiency region. Histogram:
>
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:             1            280
>  [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
> Total             1            280
>
> At 6K/ms it's predicting take 600+ms to collect this region, so it will
> never happen.
>
> I can't think of any way that there would be a high mutation rate of
> references to this Entry object..
>
> So, my shot-in-the-dark theory is similar to what Peter was thinking. When
> a region through its lifetime has a large number of other regions reference
> it, even briefly, its sparse table will overflow. Then, later in life when
> it's down to even just one object with a very small number of inbound
> references, it still has all of those coarse entries -- they don't get
> scrubbed because those regions are suffering the same issue.
>
> Thoughts?
>
> -Todd
>
> On Sun, Jan 23, 2011 at 12:42 AM, Peter Schuller <
> peter.schuller at infidyne.com> wrote:
>
>> > I still seem to be putting off GC of non-young regions too much though.
>> I
>>
>> Part of my experiments I have been harping on was the below change to
>> cut GC efficiency out of the decision to perform non-young
>> collections. I'm not suggesting it actually be disabled, but perhaps
>> it can be adjusted to fit your workload? If there is nothing outright
>> wrong in terms of predictions and the problem is due to cost estimates
>> being too high, that may be a way to avoid full GC:s at the expense of
>> more expensive GC activity. This smells like something that should be
>> a tweakable VM option. Just like GCTimeRatio affects heap expansion
>> decisions, something to affect this (probably just a ratio applied to
>> the test below?).
>>
>> Another thing: This is to a large part my human confirmation biased
>> brain speaking, but I would be really interested to find out if if the
>> slow build-up you seem to be experiencing is indeed due to rs scan
>> costs de to sparse table overflow (I've been harping about roughly the
>> same thing several times so maybe people are tired of it; most
>> recently in the thread "g1: dealing with high rates of inter-region
>> pointer writes").
>>
>> Is your test easily runnable so that one can reproduce? Preferably
>> without lots of hbase/hadoop knowledge. I.e., is it something that can
>> be run in a self-contained fashion fairly easily?
>>
>> Here's the patch indicating where to adjust the efficiency thresholding:
>>
>> --- a/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp   Fri
>> Dec 17 23:32:58 2010 -0800
>> +++ b/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp   Sun
>> Jan 23 09:21:54 2011 +0100
>> @@ -1463,7 +1463,7 @@
>>     if ( !_last_young_gc_full ) {
>>       if ( _should_revert_to_full_young_gcs ||
>>            _known_garbage_ratio < 0.05 ||
>> -           (adaptive_young_list_length() &&
>> +           (adaptive_young_list_length() && //false && // scodetodo
>>             (get_gc_eff_factor() * cur_efficiency <
>> predict_young_gc_eff())) ) {
>>         set_full_young_gcs(true);
>>       }
>>
>>
>> --
>> / Peter Schuller
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

-- 
Todd Lipcon
Software Engineer, Cloudera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110124/1e893c18/attachment.htm>
-------------- next part --------------
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use