G1GC Full GCs

Mon Jul 12 12:43:47 PDT 2010

Hi Peter --

Yes, my guess was also that something (possibly along the lines
you stated below) was preventing the selection of certain (sets
of) regions for evacuation on a regular basis ... I am told there
are flags that will allow you to get verbose details on what is
or is not selected for inclusion in the collection set; perhaps
that will help you get down to the bottom of this. Did you say
you had a test case that showed this behaviour? Filing a bug
with that test case may be the quickest way to get this before
the right set of eyes. Over to the G1 cognoscenti.

-- ramki

On 07/12/10 09:02, Peter Schuller wrote:
>> Am I missing some tuning that should be done for G1GC for applications like
>> this? Is 20ms out of 80ms too aggressive a target for the garbage rates
>> we're generating?
> 
> I have never run HBase, but in an LRU stress test (I posted about it a
> few months ago) I specifically observed remembered set scanning costs
> go way up. In addition I was seeing fallbacks to full GC:s recently in
> a slightly different test that I also posed about to -use, and that
> turned out to be a result of the estimated rset scanning costs being
> so high that regions were never selected for eviction even though they
> had very little live data. I would be very interested to hear if
> you're having the same problem. My last post on the topic is here:
> 
>    http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-June/000652.html
> 
> Including the link to the (throw-away) patch that should tell you
> whether this is what's happening:
> 
>    http://distfiles.scode.org/mlref/g1/g1_region_live_stats_hack.patch
> 
> Out of personal curiosity I'd be very interested to hear whether this
> is what's happening to you (in a real reasonable use-case rather than
> a synthetic benchmark).
> 
> My sense (and hotspot/g1 developers please smack me around if I am
> misrepresenting anything here) is that the effect I saw (with rset
> scanning costs) could cause perpetual memory grow (until fallback to
> full GC) in two ways:
> 
> (1) The estimated (and possibly real) cost of rset scanning for a
> single region could be so high that it is never possible to select it
> for eviction given the asked for pause time goals. Hence, such a
> region effectively "leaks" until full GC.
> 
> (2) The estimated (and possibly real) cost of rset scanning for
> regions may be so high that there are, in practice, always other
> regions selected for high pay-off/cost ratios, such that they end up
> never being collected even if theoretically a single region could be
> evicted within the pause time goal.
> 
> These are effectively the same thing, with (1) being an extreme case of (2).
> 
> In both cases, the effect should be mitigated (and have been in the
> case where I did my testing), but as far as I can tell not generally
> "fixed", by increasing the pause time goals.
> 
> It is unclear to me how this is intended to be handled. The original
> g1 paper mentions an rset scanning thread that I may suspect would be
> intended to help do rset scanning in the background such that regions
> like these could be evicted more cheaply during the STW eviction
> pause; but I didn't find such a thread anywhere in the source code -
> but I may very well just be missing it.
>