JEP 248: Make G1 the Default Garbage Collector

Tue Jun 2 12:19:34 UTC 2015

Hi,

I can show you a direct correlation between young gen pauses and occupancy of Tenured. In fact please find attached graphs from work I did on Monday. What I do know is that under normal operations there are regions that I wished could be collected that can’t be collected. I say this because they don’t meet the thresholds to be included in the CSet. That said, the amount of dead in each of the tenured regions is enough that overall there is a visible negative effect on GC pause times. So, I think I'd like a continuous concurrent cleaning of tenured regions or something else that I’ve not quite thought through just yet. What I do know is that if I run a full gc, the pause time picture clears up because the regions that wouldn’t get collected do and that reduces the “live set”. Some where tucked away I’ve got others logs that demonstrate this effect. I’ll see if I can dig it up.

Regards,
Kirk
PS, once we’re finished with this conversation I’d like to tackle safe-pointing behavior. I think Charlie knows what I’m talking about.

On Jun 2, 2015, at 7:38 AM, Yu Zhang <yu.zhang at oracle.com> wrote:

> Jeremy,
> 
> Thanks for the information on your case.  Very interesting, and might be typical how users are using CMS.
> 
> If we keep fragmentation out of the picture, it boils down to comparing g1 and cms young gc.  From your experience, what is the dominant cost for g1 young gc?
> 
> Thanks,
> Jenny
> 
> On 6/1/2015 10:21 PM, Jeremy Manson wrote:
>> On Mon, Jun 1, 2015 at 6:00 PM, Erik Österlund <erik.osterlund at lnu.se>
>> wrote:
>> 
>>>  Hi Jeremy,
>>> 
>>>  Are you suggesting making Google’s CMS the new default instead?
>>> 
>> Not even a little bit.  As I said, our experiences are just that - ours.
>> I'm more or less just saying that we have had much more luck improving CMS
>> than we have trying G1.  Once every year or two, we ask ourselves the
>> question of whether we should focus our attention on G1, and the answer has
>> perennially been no.
>> 
>> 
>>> The target for this is long running server applications where
>>> fragmentation issues become increasingly awkward over time. Literature
>>> suggests fragmentation overheads can be as bad as allocations costing 1/2
>>> log(n) as much memory due to fragmentation, where n is the ratio of the
>>> smallest and largest allocatable objects. In short… ouch! This can make the
>>> JVM run out of memory and crash, which is suboptimal.
>>> So I’m curious - what’s the Google solution to fragmentation using CMS?
>>> Let me guess… buy more memory? :p
>>> 
>> Google scale is such that *any* increased use of memory on a per-server
>> basis costs an enormous amount, when multiplied by the number of servers
>> we're running.  We very aggressively keep heap footprints as small as
>> possible.  We even give unused space in the heap back to the OS, which
>> saves us huge amounts of RAM across Google's servers, but is another patch
>> that Oracle doesn't want.
>> 
>> For all of this talk of larger heaps - anything larger than single digit GB
>> are outliers for our Java jobs, and we would never consider switching the
>> default to make those kinds of jobs better.
>> 
>> For users who really care about GC behavior, they design their system so
>> that they either don't see fragmentation issues, or so that periods of
>> unavailability are acceptable.  Some tune it so that the CMS generation
>> basically only contains objects that live forever, so CMS cycles (and
>> resulting fragmentation) are rare.  Aggressive users even have their admins
>> get paged when their services do a full compacting collection in the CMS,
>> and consider it a major regression.
>> 
>> Fragmentation *can* be a problem, of course.  We've responded to it by
>> doing / attempting a few things:
>> 
>> Simply optimizing the existing code can help a great deal.  For example,
>> for users who don't want to have their pager go off when they do a full
>> compaction, we've parallelized full compacting collection of the CMS
>> generation, so that it is much closer to the speed of the parallel old GC.
>> Hotspot currently falls back to an insanely slow serial collection in this
>> case, which was unacceptable for us.  This (in concert with other
>> optimizations) has significantly improved long-tail latencies.
>> 
>> We have some users who don't mind OOMEs because of thrashing as much, as
>> long as they happen in a timely fashion.  The current metrics don't really
>> allow OOME to happen because of GC thrashing in a timely way, so we've
>> tweaked that.
>> 
>> We also export fragmentation metrics from Hotspot, so that our users can
>> identify problematic behaviors.  We have a ton of other metrics we export
>> about what's in the heap and what garbage collection statistics there are,
>> allowing people to keep a pretty close eye on these issues.
>> 
>> At one point, we tried to do partial compaction during the mark phase, but
>> it was so expensive that we didn't feel comfortable inflicting it on our
>> users - it would have helped worst case behavior, and pretty much got rid
>> of full compacting collections, but would have made latencies for well
>> tuned services significantly worse.  We thought about having it be opt-in,
>> and then we realized that anyone who cared enough about their systems to
>> opt into something like that probably cared enough to fix it so that
>> fragmentation wouldn't be a problem.
>> 
>> I'm probably forgetting some other things. :)
>> 
>> Jeremy
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150602/5d05c7cd/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2015-06-02 at 2.10.01 PM.png
Type: image/png
Size: 347180 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150602/5d05c7cd/ScreenShot2015-06-02at2.10.01PM.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2015-06-02 at 2.10.11 PM.png
Type: image/png
Size: 340648 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150602/5d05c7cd/ScreenShot2015-06-02at2.10.11PM.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150602/5d05c7cd/signature.asc>