JEP 173: Remove Rarely-Used Combinations of Garbage Collectors

Fri Dec 7 17:22:56 UTC 2012

Hi all,

We are not very pleased with the discussion about the iCMS removal. We are successfully using the iCMS mode on 300+ production machines running "low"-latency trading system. The iCMS mode provides us behavior the CMS normal mode is not able to replicate and the G1 would crash as soon as it could see an opportunity to crash.

Incremental CMS mode 
+ keeps old gen collected and ready to accommodate new objects 
+ regularly collected old gen reduces fragmentation risk
+ cycle interval can be controlled via the duty cycle (and the number of CMS threads??)
+ automatic pacing is reacting to the promotion rate dynamically and reduces promotion failure
+ weak references are processed regularly (very much mandatory requirement by our data distribution framework and design patters - weak listener, weak caches etc.)
+ finalizers are called regularly (just useful)
+ no CPU peaks as marking is running permanently (duty cycle > 0) which is having constant and predictable CPU/memory bandwidth cost to the application part
+ regular overview about the live set size 
- requires more cpu (thread yields, extra code for handling increments, etc)

G1
- 2009 hotspot crash
- 2010 hotspot crash
- 2011 hotspot crash
- 2012 hotspot crash
- 2013 ???

Just recently we fixed a bug in the CMSWaitDuration implementation and made the waiting functionality available for the iCMS mode. Our iCMS STW pauses reduced by 10 times and they are not much longer than a regular ParNew times. If someone would be additionally able to fix the invalid CMS-remark time reporting, thus presenting real and better STW pauses, the CMS collector (including iCMS) could be still considered as very competitive solution to the unstable G1 collector and there would not be any thoughts about "Removing Rarely-Used Combinations of Gargabe Collectors".

http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-August/001297.html
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/004892.html

We can provide 200 various iCMS GC log files produced every day (JDK 1.6.>30, OpenJDK 1.6.x patch 7189971, soon OpenJDK 1.7.x patch 7189971, Solaris/x86, Linux, heaps 4-32GB). 

Best regards,
Michal

Od: hotspot-gc-dev-bounces at openjdk.java.net
Komu: "Jon Masamitsu" jon.masamitsu at oracle.com
Kopie: hotspot-gc-dev at openjdk.java.net
Datum: Fri, 7 Dec 2012 17:20:56 +0100
Předmet: Re: JEP 173: Remove Rarely-Used Combinations of Garbage Collectors

> Hi Jon,
> 
> Indeed, I do believe a bug is involved in one case.. but I see many cases of iCMS being used. And maybe it's a bug for the other cases also. I can't say because all I've got in those cases is a GC log. And maybe in those cases iCMS was used for no apparent reason. All I wanted to do is point out that iCMS *is* being used. I can't say the same about the other collectors on the chopping block. At the end of the day I'm just a little voice here that isn't obliged to make any of this work. All I can do is let you know what I'm seeing out in the wild. At the end of the day, the decision lays else where... and that's ok ;-)
> 
> Regards,
> Kirk
> 
> On 2012-12-07, at 4:44 PM, Jon Masamitsu  wrote:
> 
> > Kirk,
> > 
> > If iCMS is out performing CMS on big apps on big machines,
> > that's a bug in CMS.  Case in point is the CMSWaitDuration
> > fix that is in review now.   I certainly understand that customers
> > have to work with what they are given and some
> > unexpected paths have been taken.   Hopefully, we can
> > do better.
> > 
> > Jon
> > 
> > On 12/6/2012 12:58 PM, Kirk Pepperdine wrote:
> >> Hi all,
> >> 
> >> I started an email this morning trying to outline why iCMS is better than CMS but I then needed to start presenting my performance tuning seminar. I'll start again and kill the draft. So to Ramki's request, the simple reason is that by having iCMS run it keeps memory cleaner and so they never run into situations where they experience very long pause times. With CMS they always ran into a situation where the pause times exceed response time requirements (large trading application). This was but one use case. Unfortunately I only have production iCMS logs from this client. I've also received (via my send me your GC log program) a number of other logs from people with similar low latency requirements. I must say that I was quite frankly surprised to iCMS showing up in in the logs and that it was being used in so many places. That said, the results were also surprisingly very good given that this use case (24 cores, 32g heap) runs counter to original intent. But then I've seen a number of "odd ball" CMS configurations that also make sense in hindsight.
> >> 
> >> As for a reference app... very difficult to come up with... I've been picking at "reference" apps for a while.. and the problem is.. as soon as I get a good one.. you guys to something in the JVM that suddenly makes it very disappointing to use the app for a performance tuning demonstration. So, instead of answering Ramki, I had to de-tune one of my reference apps that was running perfectly horribly until I started using 1.7.0_05... Can I ask that you all to please STOP DOING THAT ;-) ....  stop making it harder to de-tune my reference app!!!!
> >> 
> >> Seriously, the biggest problem that I see right now is adaptive sizing. The default parallel collector combination works brilliantly except that adaptive sizing leaves survivor spaces undersized. This undersizing often leads to increase frequency of full collections.  I'm very much interested in looking at how this might be corrected.
> >> 
> >> -- Kirk
> >> 
> >> On 2012-12-06, at 2:23 PM, Bengt Rutisson  wrote:
> >> 
> >>> Hi Kirk and Ramki,
> >>> 
> >>> On 12/6/12 12:05 AM, Srinivas Ramakrishna wrote:
> >>>> I am thinking that if we have a "test case" or publicly available application that can serve as a "witness" to this, it would
> >>>> allow us to learn a few useful things on how regular CMS might do better for such apps, and understand the basis of
> >>>> this difference. (Unless you have already analysed it and can share your summary of it.)
> >>> Yes, I totally agree with this. If there are cases where i-CMS is better than regular CMS we need to understand why and should try to get CMS to perform as well (or better). This is a much more appealing solution to me than to keep the extra complexity that i-CMS introduces.
> >>> 
> >>> Kirk, if you have log files of runs with CMS and i-CMS it would be great if you can pass them along. I would be very interested in analyzing why i-CMS would preform better than CMS.
> >>> 
> >>> Thanks,
> >>> Bengt
> >>> 
> >>>> thanks!
> >>>> -- ramki
> >>>> 
> >>>> On Wed, Dec 5, 2012 at 3:01 PM, Srinivas Ramakrishna  wrote:
> >>>> Hi Kirk --
> >>>> 
> >>>> On Wed, Dec 5, 2012 at 2:48 PM, Kirk Pepperdine  wrote:
> >>>> Hi all,
> >>>> 
> >>>> 
> >>>> The JEP's are coming in fast and furious. There is a customer use case for iCMS.. it's used by low latency applications... and quite successfully in fact. iCMS manages large heaps much better than CMS does which translates into more manageable pause times... I've got logs from a number of customers that rely on iCMS.
> >>>> 
> >>>> This is very interesting indeed (and something i had vaguely heard a few years ago from the general grapevine, although never actually understood
> >>>> why it must be so). Could you go a bit deeper on why this is so? What exactly is it about doing a "slow, spread-out, incremental CMS collection"
> >>>> that makes it work better than bang-bang vanilla CMS in large multi-core, server environments? Perhaps the insights from that might translate into
> >>>> something useful for vanilla CMS?
> >>>> 
> >>>> Your experience does indicate that we must proceed with some caution here before we deprecate iCMS, given it might still have some useful life
> >>>> (notwithstanding my own instincts to the contrary -- in server environments -- expressed in an earlier email before I had seen yours).
> >>>> 
> >>>> thanks.
> >>>> -- ramki
> >>>> 
> >>>> 
> >>>> Regards,
> >>>> Kirk
> >>>> 
> >>>> 
> >>>> On 2012-12-05, at 11:10 PM, mark.reinhold at oracle.com wrote:
> >>>> 
> >>>>> Posted: http://openjdk.java.net/jeps/173
> >>>>> 
> >>>>> - Mark
> >>>> 
> >>>> 
> >> 
> > 
>