G1 as a CMS replacement
kirk at kodewerk.com
Sun Jun 7 22:08:05 UTC 2015
By cruft I meant zombies and/or floating garbage. I’ve always consider floating garbage as data that is dereferenced after the remark but I guess with the G1 that definition should be extended to include what I’ve called zombies. A zombie is a dead object that has been promoted because the root has yet to have been recognized as being dead because the pool hasn’t been marked. Floating garbage may (or may not) cause zombies but I reserve the term to describe data structures that are particularly prone to this phenomena. From a GC implementation POV it might make little difference. That said, from a diagnostic POV, it makes a huge difference as each condition triggers a different remedy.
I would not expect there would be much to gain by collecting or should I say evacuating a region that is > 85% full. However dumping 15% of the cruft in these uncollectable regions always seems to produce a win.
My naive idea was to some how sweep the references for the floating garbage out of the RSet. The space would be recovered when the region was finally evacuated. I have no idea how viable this idea is or even if the cost justifies the expense. What I’ve found in a number of situations is that if you have enough cores and you want better pause times it’s better to keep the current cycles running more frequently. When I configured the CMS collector to cope with a Scala compile (single threaded at the time) I managed to reduce the compile time by >30% (10 minutes just over 6). I’ve managed similar results with other applications and I’ve noticed that a number of trading applications (financials and ads) have been configuring CMS’s IOF so that it’s practically running continuously. My current guess is that we should be able to see the same types of improvements using the G1 by configuring it to soak up cores that aren’t used by the application. But in order to see gains I believe we have to improve the management of RSets.
IME you can’t sort this out in the small. If you want to tune for large heaps with a reasonable rate of churn, you need an app that is large and has a reasonable rate of churn. In my case this translates to; I have to rely on the generosity of my customers to allow me to experiment. So, my biggest challenge is simply getting to enough applications where the teams will allow me to experiment.
On Jun 7, 2015, at 11:01 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
> On 6/7/2015 1:20 AM, Kirk Pepperdine wrote:
>> Hi Jon,
>> I’ve been holding back on this because my thoughts are still not well formed but since we’re on the subject I thought I’d throw out my bad ideas in hopes that they inspire a better idea.
>>> I also know that the static initiating occupancy of G1 can be
>>> a hindrance and that the larger G1 process footprint is a
>>> disadvantage. Are either of those blocker for transition to
>>> G1 for some applications.
>> I’m not sure that a static IHOP is really the problem. From what I can see, it’s the accumulated cruft in regions that are not deemed “ripe” enough to sweep that is a bigger problem. From the GC logs I’m getting from various customers trying to use G1 I where period calls to full collections are made, I can see that when this cruft gets cleaned up there is a corresponding and proportional drop in subsequent young collection pause times.
> When you say cruft you mean live data spread throughout the heap, right? You're not
> talking about some side effect of floating garbage.
>> Since this drop in pause all seems to be connected to RSet refinement/processing, it would seem to suggest that there might be some benefits if some how the RSets of target young regions could be cleaned during one of the concurrent phases of the concurrent mark in tenured. Maybe there could be a concurrent sweep (without the evacuation) phase added at the end of the cycle that could simply clean RSets of the pointers coming from said cruft. A full evacuation of a region would still be the domain of the young gen sweep.
> I'm going to have to read some code tomorrow to see what cleaning is done but one side effect
> of a full GC could be that the "cruft" that was spread out over 10 regions is compacted into 1
> region. That would affect RSet's such that a young region collected before the full GC would
> have RSets for 10 regions. A collection of a young region after the full GC might only have
> a RSet for 1 region (where all the cruft is). Is this a possible interpretation of what you're
> seeing? If not, as I said, I'll look at what's done in terms of precleaning and get back. Thanks
> for asking.
>> The current alternative is too simply make the collector more aggressive in how it selects regions. However, I feel tuning this way defeats the purpose or intent behind the G1.
More information about the hotspot-dev