JEP 173: Remove Rarely-Used Combinations of Garbage Collectors

Michal Frajt michal at frajt.eu
Fri Dec 7 20:17:04 UTC 2012


Hi Monica,

We do once per year a G1 experiment. As soon as we get a hotspot crash we revert it to our years proven iCMS configuration and forget everything about it. I admit that 2012 G1 experiment was running for several days on our development environment without a crash. But the crash came, we reverted back, log files and core dumps are unfortunately gone. Mind that the experiment was done with J2SE 6 as we were not yet J2SE 7 compatible. I can simply activate it again for some processes, this time with the J2SE 7 (will download the latest version).

When we were analyzing the GC/CMS code, we found the ParNew adaptive sizing very interesting. We experimented a bit. I will experiment again and send you the hs_err_file and the core file.

Regards
Michal



Od: "Monica Beckwith" monica.beckwith at oracle.com
Komu: "John Cuthbertson" john.cuthbertson at oracle.com
Kopie: "Michal Frajt" michal at frajt.eu, hotspot-gc-dev at openjdk.java.net
Datum: Fri, 07 Dec 2012 13:37:25 -0600
Předmet: Re: JEP 173: Remove Rarely-Used Combinations of Garbage Collectors


> 

  
    
  
  
    Hi Michal - > 
    > 
    I don't mean to derail this conversation, but it would be nice if
    you could list the java version for your 2012 experiment with G1.
    And as John mentioned below, we are interested in hearing about your
    experience with G1 and we would like to help.> 
    > 
    -Monica> 
    > 
    On 12/7/2012 11:53 AM, John Cuthbertson wrote:
    
      
      Hi Michal,> 
      > 
      If you are seeing a crash in G1 with your application - I'm very
      interested in hearing about it. Do you have GC logs. the hs_err
      file,
      or a core file (all 3 would be better). Are you able to run heap
      verification on a test system? That might give us a few more
      details.> 
      > 
      Thanks,> 
      > 
      JohnC> 
      > 
      On 12/07/12 09:22, Michal Frajt wrote:
      
        >  
Hi all,

We are not very pleased with the discussion about the iCMS removal. We are successfully using the iCMS mode on 300+ production machines running "low"-latency trading system. The iCMS mode provides us behavior the CMS normal mode is not able to replicate and the G1 would crash as soon as it could see an opportunity to crash.

Incremental CMS mode 
+ keeps old gen collected and ready to accommodate new objects 
+ regularly collected old gen reduces fragmentation risk
+ cycle interval can be controlled via the duty cycle (and the number of CMS threads??)
+ automatic pacing is reacting to the promotion rate dynamically and reduces promotion failure
+ weak references are processed regularly (very much mandatory requirement by our data distribution framework and design patters - weak listener, weak caches etc.)
+ finalizers are called regularly (just useful)
+ no CPU peaks as marking is running permanently (duty cycle > 0) which is having constant and predictable CPU/memory bandwidth cost to the application part
+ regular overview about the live set size 
- requires more cpu (thread yields, extra code for handling increments, etc)

G1
- 2009 hotspot crash
- 2010 hotspot crash
- 2011 hotspot crash
- 2012 hotspot crash
- 2013 ???


Just recently we fixed a bug in the CMSWaitDuration implementation and made the waiting functionality available for the iCMS mode. Our iCMS STW pauses reduced by 10 times and they are not much longer than a regular ParNew times. If someone would be additionally able to fix the invalid CMS-remark time reporting, thus presenting real and better STW pauses, the CMS collector (including iCMS) could be still considered as very competitive solution to the unstable G1 collector and there would not be any thoughts about "Removing Rarely-Used Combinations of Gargabe Collectors".

http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-August/001297.html
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/004892.html

We can provide 200 various iCMS GC log files produced every day (JDK 1.6.>30, OpenJDK 1.6.x patch 7189971, soon OpenJDK 1.7.x patch 7189971, Solaris/x86, Linux, heaps 4-32GB). 

Best regards,
Michal
 
 
Od: hotspot-gc-dev-bounces at openjdk.java.net
Komu: "Jon Masamitsu" jon.masamitsu at oracle.com
Kopie: hotspot-gc-dev at openjdk.java.net
Datum: Fri, 7 Dec 2012 17:20:56 +0100
Předmet: Re: JEP 173: Remove Rarely-Used Combinations of Garbage Collectors

  
        
          > Hi Jon,

Indeed, I do believe a bug is involved in one case.. but I see many cases of iCMS being used. And maybe it's a bug for the other cases also. I can't say because all I've got in those cases is a GC log. And maybe in those cases iCMS was used for no apparent reason. All I wanted to do is point out that iCMS *is* being used. I can't say the same about the other collectors on the chopping block. At the end of the day I'm just a little voice here that isn't obliged to make any of this work. All I can do is let you know what I'm seeing out in the wild. At the end of the day, the decision lays else where... and that's ok ;-)

Regards,
Kirk

On 2012-12-07, at 4:44 PM, Jon Masamitsu  wrote:

    
          
            > Kirk,

If iCMS is out performing CMS on big apps on big machines,
that's a bug in CMS.  Case in point is the CMSWaitDuration
fix that is in review now.   I certainly understand that customers
have to work with what they are given and some
unexpected paths have been taken.   Hopefully, we can
do better.

Jon

On 12/6/2012 12:58 PM, Kirk Pepperdine wrote:
      
            
              > Hi all,

I started an email this morning trying to outline why iCMS is better than CMS but I then needed to start presenting my performance tuning seminar. I'll start again and kill the draft. So to Ramki's request, the simple reason is that by having iCMS run it keeps memory cleaner and so they never run into situations where they experience very long pause times. With CMS they always ran into a situation where the pause times exceed response time requirements (large trading application). This was but one use case. Unfortunately I only have production iCMS logs from this client. I've also received (via my send me your GC log program) a number of other logs from people with similar low latency requirements. I must say that I was quite frankly surprised to iCMS showing up in in the logs and that it was being used in so many places. That said, the results were also surprisingly very good given that this use case (24 cores, 32g heap) runs counter to original intent. But then I've seen a 
number of "odd ball" CMS configurations that also make sense in hindsight.

As for a reference app... very difficult to come up with... I've been picking at "reference" apps for a while.. and the problem is.. as soon as I get a good one.. you guys to something in the JVM that suddenly makes it very disappointing to use the app for a performance tuning demonstration. So, instead of answering Ramki, I had to de-tune one of my reference apps that was running perfectly horribly until I started using 1.7.0_05... Can I ask that you all to please STOP DOING THAT ;-) ....  stop making it harder to de-tune my reference app!!!!

Seriously, the biggest problem that I see right now is adaptive sizing. The default parallel collector combination works brilliantly except that adaptive sizing leaves survivor spaces undersized. This undersizing often leads to increase frequency of full collections.  I'm very much interested in looking at how this might be corrected.

-- Kirk

On 2012-12-06, at 2:23 PM, Bengt Rutisson  wrote:

        
              
                > Hi Kirk and Ramki,

On 12/6/12 12:05 AM, Srinivas Ramakrishna wrote:
          
                
                  > I am thinking that if we have a "test case" or publicly available application that can serve as a "witness" to this, it would
allow us to learn a few useful things on how regular CMS might do better for such apps, and understand the basis of
this difference. (Unless you have already analysed it and can share your summary of it.)
            
                
                > Yes, I totally agree with this. If there are cases where i-CMS is better than regular CMS we need to understand why and should try to get CMS to perform as well (or better). This is a much more appealing solution to me than to keep the extra complexity that i-CMS introduces.

Kirk, if you have log files of runs with CMS and i-CMS it would be great if you can pass them along. I would be very interested in analyzing why i-CMS would preform better than CMS.

Thanks,
Bengt

          
                
                  > thanks!
-- ramki

On Wed, Dec 5, 2012 at 3:01 PM, Srinivas Ramakrishna  wrote:
Hi Kirk --

On Wed, Dec 5, 2012 at 2:48 PM, Kirk Pepperdine  wrote:
Hi all,


The JEP's are coming in fast and furious. There is a customer use case for iCMS.. it's used by low latency applications... and quite successfully in fact. iCMS manages large heaps much better than CMS does which translates into more manageable pause times... I've got logs from a number of customers that rely on iCMS.

This is very interesting indeed (and something i had vaguely heard a few years ago from the general grapevine, although never actually understood
why it must be so). Could you go a bit deeper on why this is so? What exactly is it about doing a "slow, spread-out, incremental CMS collection"
that makes it work better than bang-bang vanilla CMS in large multi-core, server environments? Perhaps the insights from that might translate into
something useful for vanilla CMS?

Your experience does indicate that we must proceed with some caution here before we deprecate iCMS, given it might still have some useful life
(notwithstanding my own instincts to the contrary -- in server environments -- expressed in an earlier email before I had seen yours).

thanks.
-- ramki


Regards,
Kirk


On 2012-12-05, at 11:10 PM, mark.reinhold at oracle.com wrote:

            
                  
                    > Posted: http://openjdk.java.net/jeps/173

- Mark
              
                  
                  >             
                
              
            
          
        
        > 
  
      
      > 
    
    > 
    > -- > 
      > 
      Monica Beckwith | Java Performance Engineer> 
        VOIP: +1 512 401 1274
        > 
        Texas 
      > 
      
      Oracle is committed to developing practices and
        products that help protect the environment
      
    
  








More information about the hotspot-gc-dev mailing list