Question regarding the need for From/To spaces (vs. simply compacting).

Wed Jun 23 15:17:34 UTC 2010

Dawid,

Hi. This is a good question that I get asked quite frequently. The main 
reason is that a copying GC is much faster in garbage collecting a space 
than a sliding compacting GC when the percentage of live objects is small.

A copying GC typically only visits the live objects in the space that is 
being GCed (it starts form the roots and keeps copying objects as it 
discovers them to be live). On the other hand a sliding compacting GC 
would typically visit both the live objects (twice!) and the garbage 
objects in the space that is being GCed (if first marks all the live 
objects, then sweeps over the space visiting all objects and moves the 
live ones). There are of course variations on these two techniques, but 
the above is the main trade-off between them.

So, if the space that is being GCed is sparse with live objects (and 
this is the assumption for the young generation), then a copying GC will 
always win in how much work it needs to do and how much memory it needs 
to access over a compacting GC. I had done some experiments in the past 
to see how good I could make a compacting GC and I couldn't get to 
anything better than twice as slow as a copying GC for the young gen. Of 
course, lots of things have changed since then (workloads / 
architectures / etc.) so maybe we need to redo those. But I would be 
surprised if I get largely different results today.

Anyway, hope this helps,

Tony, HS GC Group

Dawid Weiss wrote:
> Hello there,
>
> A friend of mine asked me a question that I couldn't find any sensible
> answer for both in literature, on the Web and by browsing through
> OpenJDK source code (but this, I admit, is not easy considering the
> volume of code there). The question is:
>
> "Why is the young gen. split between to/from spaces, so that the GC
> must copy everything back and forth between two memory regions if only
> a single monolithic space would suffice for compacting live objects?"
>
> The only thing I could tell him was that this design decision could be
> motivated by the fact that:
>
> - the implementation is simplified since the copied objects end up
> either in the TO space or are promoted somewhere else (in case of a
> generational GC),
> - the implementation may be paralellized to compact concurrently and
> then merge the compacted segments (assumption by looking at
> psYoungGen.cpp's move_and_update method).
>
> No other clues from me. If I'm missing something obvious (which is
> probably the case), then an RTFM with a pointer would be great.
>
> Dawid
>