Todd Lipcon todd at cloudera.com
Mon Jul 12 09:09:49 PDT 2010

Hi Peter,

This sounds interesting, and plausible to me (though I have no clue about
the codebase!)

I'm leaving for a trip for the next two weeks tomorrow, though, so not sure
I'll have a chance to try the patch before then. I'll certainly circle back
on this towards the end of the month.

Thanks again for all the continued help.


On Mon, Jul 12, 2010 at 9:02 AM, Peter Schuller <peter.schuller at infidyne.com
> wrote:

> > Am I missing some tuning that should be done for G1GC for applications
> like
> > this? Is 20ms out of 80ms too aggressive a target for the garbage rates
> > we're generating?
> I have never run HBase, but in an LRU stress test (I posted about it a
> few months ago) I specifically observed remembered set scanning costs
> go way up. In addition I was seeing fallbacks to full GC:s recently in
> a slightly different test that I also posed about to -use, and that
> turned out to be a result of the estimated rset scanning costs being
> so high that regions were never selected for eviction even though they
> had very little live data. I would be very interested to hear if
> you're having the same problem. My last post on the topic is here:
> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-June/000652.html
> Including the link to the (throw-away) patch that should tell you
> whether this is what's happening:
>   http://distfiles.scode.org/mlref/g1/g1_region_live_stats_hack.patch
> Out of personal curiosity I'd be very interested to hear whether this
> is what's happening to you (in a real reasonable use-case rather than
> a synthetic benchmark).
> My sense (and hotspot/g1 developers please smack me around if I am
> misrepresenting anything here) is that the effect I saw (with rset
> scanning costs) could cause perpetual memory grow (until fallback to
> full GC) in two ways:
> (1) The estimated (and possibly real) cost of rset scanning for a
> single region could be so high that it is never possible to select it
> for eviction given the asked for pause time goals. Hence, such a
> region effectively "leaks" until full GC.
> (2) The estimated (and possibly real) cost of rset scanning for
> regions may be so high that there are, in practice, always other
> regions selected for high pay-off/cost ratios, such that they end up
> never being collected even if theoretically a single region could be
> evicted within the pause time goal.
> These are effectively the same thing, with (1) being an extreme case of
> (2).
> In both cases, the effect should be mitigated (and have been in the
> case where I did my testing), but as far as I can tell not generally
> "fixed", by increasing the pause time goals.
> It is unclear to me how this is intended to be handled. The original
> g1 paper mentions an rset scanning thread that I may suspect would be
> intended to help do rset scanning in the background such that regions
> like these could be evicted more cheaply during the STW eviction
> pause; but I didn't find such a thread anywhere in the source code -
> but I may very well just be missing it.
> / Peter Schuller

