JEP 189: Shenandoah: An Ultra-Low-Pause-Time Garbage Collector
Kirk Pepperdine
kirk at kodewerk.com
Mon Jan 20 20:52:54 UTC 2014
On Jan 20, 2014, at 9:13 PM, Christine Flood <chf at redhat.com> wrote:
> Concurrent Mark and Sweep falls back on full GC's because the old generation gets too fragmented.
Technically speaking, CMS falls back on a full GC because of an allocation failure either from a ParNew thread or a mutator thread creating an object larger than 1/2 of Eden. This may or may not be due to fragmentation (although fragmentation is the more common case). I’ve seen it go either way.
> G1 falls back on full GC's because it is trying so hard to hit it's pause time goals that it can't keep up.
>
> We have a lot of work to do on heuristics for Shenandoah, but theoretically by allocating more threads to the garbage collector we can gently slow down the mutator enough for us to keep up. The only reasons I see for a full GC fallback position are if we have a humongous object that we don't have enough contiguous space for or if the user explicity asks for a System.gc().
This is one place where G1 can fall over. I you consider the odds of finding enough contiguous free regions needed to satisfying a humongous allocation you see that it’s quite likely you’ll fail to do so. Unless you’re careful the odds of a collection correcting this problem are better but not as great as one might like it to be. This might be an implementation detail and you might be able to avoid failure if the collector has a continuous region goal to meet and is able (if need be) to evacuate a “non-ripe” region to meet that goal.
>
> I reserve the right to reverse this position once we have more experience with Shenandoah.
Which we all happily grant :-)
BTW, the history of G1 is that is started without a generational perturbation but it was quickly determined that the gains to be made with having a separate nursery was too great to ignore. IBM also tried to keep a single space for quite some time but eventually backed out to a generational arrangement. My question is; what is in this collector that makes you believe you can ignore the weak generational hypothesis?
As an aside, I have no bias for generational collectors but I have seen places where having a continuously running collector would have been very advantageous. For example, I had one case we had a hard 16ms time budget on bursts of fixed units of work. In that case the work could be completed in 5 or 6 ms which left 10ms for GC (should it decide to run and undoubtably would have run in the middle of the 5-6ms burst of work) but we could never get the collector to run consistently in under 10ms without risking an OOME. I’m pretty convinced that continuous collection policy would have helped us meet that goal and we had plenty of CPU to throw at the problem so….
Regards,
Kirk
More information about the hotspot-gc-dev
mailing list