G1 as a CMS replacement

Mon Jun 8 07:21:16 UTC 2015

Hi Kirk,

On Mon, 2015-06-08 at 00:08 +0200, Kirk Pepperdine wrote:
> Hi John,
> 
[...]
> I would not expect there would be much to gain by collecting or should I say
> evacuating a region that is > 85% full. However dumping 15% of the cruft in
> these uncollectable regions always seems to produce a win.
> 
> My naive idea was to some how sweep the references for the floating garbage
> out of the RSet. The space would be recovered when the region was finally 
> evacuated. I have no idea how viable this idea is or even if the cost

During iterating over the remembered sets, known dead objects are not
iterated over, i.e. their references are not followed.

Dead (actually live) objects are determined by concurrent marking, so
marking will keep the amount of "zombies" small.

> justifies the expense. What I’ve found in a number of situations is that if
> you have enough cores and you want better pause times it’s better to keep the
> current cycles running more frequently. When I configured the CMS collector
> to cope with a Scala compile (single threaded at the time) I managed to reduce
> the compile time by >30% (10 minutes just over 6). I’ve managed similar
> results with other applications and I’ve noticed that a number of trading
> applications (financials and ads) have been configuring CMS’s IOF so that
> it’s practically running continuously. My current guess is that we should
> be able to see the same types of improvements using the G1 by configuring it
> to soak up cores that aren’t used by the application. But in order to see
> gains I believe we have to improve the management of RSets.
> 
> IME you can’t sort this out in the small. If you want to tune for large heaps
> with a reasonable rate of churn, you need an app that is large and has a
> reasonable rate of churn. In my case this translates to; I have to rely on the
> generosity of my customers to allow me to experiment.  So, my biggest challenge
> is simply getting to enough applications where the teams will allow me to
> experiment.

(some more comments below)

> On Jun 7, 2015, at 11:01 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
>
> > On 6/7/2015 1:20 AM, Kirk Pepperdine wrote:
[...]
> >> I’m not sure that a static IHOP is really the problem. From what I can see,
> >>it’s the accumulated cruft in regions that are not deemed “ripe” enough to
>>>sweep that is a bigger problem. From the GC logs I’m getting from
>>>various customers trying to use G1 I where period calls to full
>>>collections are made, I can see that when this cruft gets cleaned up
>>>there is a corresponding and proportional drop in subsequent young
>>>collection pause times.

[...]

> > When you say cruft you mean live data spread throughout the heap, right?  You're not
> > talking about some side effect of floating garbage.
> > 
> >> 
> >> Since this drop in pause all seems to be connected to RSet refinement/
>>>processing, it would seem to suggest that there might be some
>>> benefits if some how the RSets of target young regions could be
>>> cleaned during one of the concurrent phases of the concurrent mark
>>> in tenured. Maybe there could be a concurrent sweep (without the
>>> evacuation) phase added at the end of the cycle that could simply
>>> clean RSets of the pointers coming from said cruft. A full
>>> evacuation of a region would still be the domain of the young gen
>>> sweep.
> > 
> > I'm going to have to read some code tomorrow to see what cleaning is done but one side effect
> > of a full GC could be that the "cruft" that was spread out over 10 regions is compacted into 1
> > region. 

That is one concern.

> >That would affect RSet's such that a young region collected before the full GC would
> > have RSets for 10 regions.  A collection of a young region after the full GC might only have

During full GC RSets are recreated from scratch. There is no cruft in
there then.

Also, full GC typically allows larger eden afterwards, which might cause
a decrease in GC times if the object death rates are right.

> > a RSet for 1 region (where all the cruft is).  Is this a possible interpretation of what you're
> > seeing?  If not, as I said, I'll look at what's done in terms of precleaning and get back.  Thanks
> > for asking.
> > 

The most likely explanation for this behavior is that in the internal
representation the remembered set contents only ever get coarsened, i.e.
from data structures that take less space but take more time to find the
references.

Even if a particular remembered becomes mostly empty (completely empty
ones are removed of course), G1 never tries to go the other way, i.e.
use a representation that is easier on the amount of work that needs to
be done if possible.

Thanks,
  Thomas