JEP 248: Make G1 the Default Garbage Collector

Tue Jun 2 05:38:46 UTC 2015

Jeremy,

Thanks for the information on your case.  Very interesting, and might be 
typical how users are using CMS.

If we keep fragmentation out of the picture, it boils down to comparing 
g1 and cms young gc.  From your experience, what is the dominant cost 
for g1 young gc?

Thanks,
Jenny

On 6/1/2015 10:21 PM, Jeremy Manson wrote:
> On Mon, Jun 1, 2015 at 6:00 PM, Erik Österlund <erik.osterlund at lnu.se>
> wrote:
>
>>   Hi Jeremy,
>>
>>   Are you suggesting making Google’s CMS the new default instead?
>>
> Not even a little bit.  As I said, our experiences are just that - ours.
> I'm more or less just saying that we have had much more luck improving CMS
> than we have trying G1.  Once every year or two, we ask ourselves the
> question of whether we should focus our attention on G1, and the answer has
> perennially been no.
>
>
>> The target for this is long running server applications where
>> fragmentation issues become increasingly awkward over time. Literature
>> suggests fragmentation overheads can be as bad as allocations costing 1/2
>> log(n) as much memory due to fragmentation, where n is the ratio of the
>> smallest and largest allocatable objects. In short… ouch! This can make the
>> JVM run out of memory and crash, which is suboptimal.
>> So I’m curious - what’s the Google solution to fragmentation using CMS?
>> Let me guess… buy more memory? :p
>>
> Google scale is such that *any* increased use of memory on a per-server
> basis costs an enormous amount, when multiplied by the number of servers
> we're running.  We very aggressively keep heap footprints as small as
> possible.  We even give unused space in the heap back to the OS, which
> saves us huge amounts of RAM across Google's servers, but is another patch
> that Oracle doesn't want.
>
> For all of this talk of larger heaps - anything larger than single digit GB
> are outliers for our Java jobs, and we would never consider switching the
> default to make those kinds of jobs better.
>
> For users who really care about GC behavior, they design their system so
> that they either don't see fragmentation issues, or so that periods of
> unavailability are acceptable.  Some tune it so that the CMS generation
> basically only contains objects that live forever, so CMS cycles (and
> resulting fragmentation) are rare.  Aggressive users even have their admins
> get paged when their services do a full compacting collection in the CMS,
> and consider it a major regression.
>
> Fragmentation *can* be a problem, of course.  We've responded to it by
> doing / attempting a few things:
>
> Simply optimizing the existing code can help a great deal.  For example,
> for users who don't want to have their pager go off when they do a full
> compaction, we've parallelized full compacting collection of the CMS
> generation, so that it is much closer to the speed of the parallel old GC.
> Hotspot currently falls back to an insanely slow serial collection in this
> case, which was unacceptable for us.  This (in concert with other
> optimizations) has significantly improved long-tail latencies.
>
> We have some users who don't mind OOMEs because of thrashing as much, as
> long as they happen in a timely fashion.  The current metrics don't really
> allow OOME to happen because of GC thrashing in a timely way, so we've
> tweaked that.
>
> We also export fragmentation metrics from Hotspot, so that our users can
> identify problematic behaviors.  We have a ton of other metrics we export
> about what's in the heap and what garbage collection statistics there are,
> allowing people to keep a pretty close eye on these issues.
>
> At one point, we tried to do partial compaction during the mark phase, but
> it was so expensive that we didn't feel comfortable inflicting it on our
> users - it would have helped worst case behavior, and pretty much got rid
> of full compacting collections, but would have made latencies for well
> tuned services significantly worse.  We thought about having it be opt-in,
> and then we realized that anyone who cared enough about their systems to
> opt into something like that probably cared enough to fix it so that
> fragmentation wouldn't be a problem.
>
> I'm probably forgetting some other things. :)
>
> Jeremy