JEP 189: Shenandoah: An Ultra-Low-Pause-Time Garbage Collector

Mon Jan 20 21:17:14 UTC 2014

----- Original Message -----
> From: "Kirk Pepperdine" <kirk at kodewerk.com>
> To: "Christine Flood" <chf at redhat.com>
> Cc: "Krystal Mok" <rednaxelafx at gmail.com>, "hotspot-gc-dev at openjdk.java.net openjdk.java.net"
> <hotspot-gc-dev at openjdk.java.net>
> Sent: Monday, January 20, 2014 3:52:54 PM
> Subject: Re: JEP 189: Shenandoah: An Ultra-Low-Pause-Time Garbage Collector
> 
> 
> On Jan 20, 2014, at 9:13 PM, Christine Flood <chf at redhat.com> wrote:
> 
> > Concurrent Mark and Sweep falls back on full GC's because the old
> > generation gets too fragmented.
> 
> Technically speaking, CMS falls back on a full GC because of an allocation
> failure either from a ParNew thread or a mutator thread creating an object
> larger than 1/2 of Eden. This may or may not be due to fragmentation
> (although fragmentation is the more common case). I’ve seen it go either
> way.
> 
> > G1 falls back on full GC's because it is trying so hard to hit it's pause
> > time goals that it can't keep up.
> > 
> > We have a lot of work to do on heuristics for Shenandoah, but theoretically
> > by allocating more threads to the garbage collector we can gently slow
> > down the mutator enough for us to keep up.  The only reasons I see for a
> > full GC fallback position are if we have a humongous object that we don't
> > have enough contiguous space for or if the user explicity asks for a
> > System.gc().
> 
> This is one place where G1 can fall over. I you consider the odds of finding
> enough contiguous free regions needed to satisfying a humongous allocation
> you see that it’s quite likely you’ll fail to do so. Unless you’re careful
> the odds of a collection correcting this problem are better but not as great
> as one might like it to be. This might be an implementation detail and you
> might be able to avoid failure if the collector has a continuous region goal
> to meet and is able (if need be) to evacuate a “non-ripe” region to meet
> that goal.
> 
> > 
> > I reserve the right to reverse this position once we have more experience
> > with Shenandoah.
> 
> Which we all happily grant :-)
> 
> BTW, the history of G1 is that is started without a generational perturbation
> but it was quickly determined that the gains to be made with having a
> separate nursery was too great to ignore. IBM also tried to keep a single
> space for quite some time but eventually backed out to a generational
> arrangement. My question is; what is in this collector that makes you
> believe you can ignore the weak generational hypothesis?
> 

The benefit of generational systems is as much about remembered set size as it is about object lifetimes.  Generational G1 is a win because you don't need to keep track of pointers from young regions to older regions.  The young regions are collected at every GC pause and will be traced then.  

Our algorithm doesn't require remembered sets because reference updates happen lazily.  We don't need to be able to find every reference to an object and update it, we can wait for the concurrent marking thread to find those references and update them for us.  This gives us a different cost/benefit analysis.  I believe Shenandoah fulfills the promise of G1 by concentrating GC effort where the garbage is whether or not that is in young regions.

There is no distinction in Shenandoah between allocating in a separate eden and allocating in an empty region, you would perform exactly the same work.

> As an aside, I have no bias for generational collectors but I have seen
> places where having a continuously running collector would have been very
> advantageous. For example, I had one case we had a hard 16ms time budget on
> bursts of fixed units of work. In that case the work could be completed in 5
> or 6 ms which left 10ms for GC (should it decide to run and undoubtably
> would have run in the middle of the 5-6ms burst of work) but we could never
> get the collector to run consistently in under 10ms without risking an OOME.
> I’m pretty convinced that continuous collection policy would have helped us
> meet that goal and we had plenty of CPU to throw at the problem so….
> 
> Regards,
> Kirk
> 
>