Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS

Wed Dec 12 12:35:25 UTC 2012

All, 

Find the attached patch. It implements proposed recommendations and requested changes. Please mind that the CMSWaitDuration set to -1 (never wait) requires new parameter CMSCheckInterval (develop only, 1000 milliseconds default - constant).  The parameter defines the next CMS cycle start check interval in the case there are no desynchronization (notifications) events on the CGC_lock.

Tested with the Solaris/amd64 build

CMS 
+ CMSWaitDuration>0 OK
+ CMSWaitDuration=0 OK
+ CMSWaitDuration<0 OK

iCMS 
+ CMSWaitDuration>0 OK
+ CMSWaitDuration=0 OK
+ CMSWaitDuration<0 OK

Regards,
Michal

Od: hotspot-gc-dev-bounces at openjdk.java.net
Komu: hotspot-gc-dev at openjdk.java.net
Kopie: 
Datum: Fri,  7 Dec 2012 18:48:48 +0100
Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental  mode of CMS

> Hi John/Jon/Ramki,
> 
> All proposed recommendations and requested changes have been implemented. We are going to test it on Monday. You will get the new tested patch soon. 
> 
> The attached code here just got compiled, no test executed yet, it might contain a bug, but you can quickly review it and send your comments.
> 
> Best regards
> Michal
> 
> 
> // Wait until the next synchronous GC, a concurrent full gc request,
> // or a timeout, whichever is earlier.
> void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long t_millis) {
>   // Wait time in millis or 0 value representing infinite wait for a scavenge
>   assert(t_millis >= 0, "Wait time for scavenge should be 0 or positive");
> 
>   GenCollectedHeap* gch = GenCollectedHeap::heap();
>   double start_time_secs = os::elapsedTime();
>   double end_time_secs = start_time_secs + (t_millis / ((double) MILLIUNITS));
> 
>   // Total collections count before waiting loop
>   unsigned int before_count;
>   {
>     MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>     before_count = gch->total_collections();
>   }
> 
>   unsigned int loop_count = 0;
> 
>   while(!_should_terminate) {
>     double now_time = os::elapsedTime();
>     long wait_time_millis;
> 
>     if(t_millis != 0) {
>       // New wait limit
>       wait_time_millis = (long) ((end_time_secs - now_time) * MILLIUNITS);
>       if(wait_time_millis <= 0) {
>         // Wait time is over
>         break;
>       }
>     } else {
>       // No wait limit, wait if necessary forever
>       wait_time_millis = 0;
>     }
> 
>     // Wait until the next event or the remaining timeout
>     {
>       MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
> 
>       set_CMS_flag(CMS_cms_wants_token);   // to provoke notifies
>       if (_should_terminate || _collector->_full_gc_requested) {
>         return;
>       }
>       assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
>       CGC_lock->wait(Mutex::_no_safepoint_check_flag, wait_time_millis);
>       clear_CMS_flag(CMS_cms_wants_token);
>       assert(!CMS_flag_is_set(CMS_cms_has_token | CMS_cms_wants_token),
>              "Should not be set");
>     }
> 
>     // Extra wait time check before entering the heap lock to get the collection count
>     if(t_millis != 0 && os::elapsedTime() >= end_time_secs) {
>       // Wait time is over
>       break;
>     }
> 
>     // Total collections count after the event
>     unsigned int after_count;
>     {
>       MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>       after_count = gch->total_collections();
>     }
> 
>     if(before_count != after_count) {
>       // There was a collection - success
>       break;
>     }
> 
>     // Too many loops warning
>     if(++loop_count == 0) {
>       warning("wait_on_cms_lock_for_scavenge() has looped %d times", loop_count - 1);
>     }
>   }
> }
> 
> void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
>   while (!_should_terminate) {
>     if (CMSIncrementalMode) {
>       icms_wait();
>       if(CMSWaitDuration >= 0) {
>         // Wait until the next synchronous GC, a concurrent full gc
>         // request or a timeout, whichever is earlier.
>         wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>       }
>       return;
>     } else {
>       if(CMSWaitDuration >= 0) {
>         // Wait until the next synchronous GC, a concurrent full gc
>         // request or a timeout, whichever is earlier.
>         wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>       } else {
>         // Wait until any cms_lock event not to call shouldConcurrentCollect permanently
>         wait_on_cms_lock(0);
>       }
>     }
>     // Check if we should start a CMS collection cycle
>     if (_collector->shouldConcurrentCollect()) {
>       return;
>     }
>     // .. collection criterion not yet met, let's go back
>     // and wait some more
>   }
> }
>  
> 
>  
> Od: hotspot-gc-dev-bounces at openjdk.java.net
> Komu: "Jon Masamitsu" jon.masamitsu at oracle.com,"John Cuthbertson" john.cuthbertson at oracle.com
> Kopie: hotspot-gc-dev at openjdk.java.net
> Datum: Thu, 6 Dec 2012 23:43:29 -0800
> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS
> 
> > Hi John --
> > 
> > wrt the changes posted, i see the intent of the code and agree with
> > it. I have a few minor suggestions on the
> > details of how it's implemented. My comments are inline below,
> > interleaved with the code:
> > 
> >  317 // Wait until the next synchronous GC, a concurrent full gc request,
> >  318 // or a timeout, whichever is earlier.
> >  319 void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
> > t_millis) {
> >  320   // Wait for any cms_lock event when timeout not specified (0 millis)
> >  321   if (t_millis == 0) {
> >  322     wait_on_cms_lock(t_millis);
> >  323     return;
> >  324   }
> > 
> > I'd completely avoid the special case above because it would miss the
> > part about waiting for a
> > scavenge, instead dealing with that case in the code in the loop below
> > directly. The idea
> > of the "0" value is not to ask that we return immediately, but that we
> > wait, if necessary
> > forever, for a scavenge. The "0" really represents the value infinity
> > in that sense. This would
> > be in keeping with our use of wait() with a "0" value for timeout at
> > other places in the JVM as
> > well, so it's consistent.
> > 
> >  325
> >  326   GenCollectedHeap* gch = GenCollectedHeap::heap();
> >  327   double start_time = os::elapsedTime();
> >  328   double end_time = start_time + (t_millis / 1000.0);
> > 
> > Note how, the end_time == start_time for the special case of t_millis
> > == 0, so we need to treat that
> > case specially below.
> > 
> >  329
> >  330   // Total collections count before waiting loop
> >  331   unsigned int before_count;
> >  332   {
> >  333     MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
> >  334     before_count = gch->total_collections();
> >  335   }
> > 
> > Good.
> > 
> >  336
> >  337   while (true) {
> >  338     double now_time = os::elapsedTime();
> >  339     long wait_time_millis = (long)((end_time - now_time) * 1000.0);
> >  340
> >  341     if (wait_time_millis <= 0) {
> >  342       // Wait time is over
> >  343       break;
> >  344     }
> > 
> > Modify to:
> >            if (t_millis != 0) {
> >               if  (wait_time_millis <= 0)  {
> >                  // Wait time is over
> >                  break;
> >              }
> >           } else {
> >              wait_time_millis = 0;     // for use in wait() below
> >          }
> > 
> >  345
> >  346     // Wait until the next event or the remaining timeout
> >  347     {
> >  348       MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
> >  349       if (_should_terminate || _collector->_full_gc_requested) {
> >  350         return;
> >  351       }
> >  352       set_CMS_flag(CMS_cms_wants_token);   // to provoke notifies
> > 
> > insert:     assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
> > 
> >  353       CGC_lock->wait(Mutex::_no_safepoint_check_flag, wait_time_millis);
> >  354       clear_CMS_flag(CMS_cms_wants_token);
> >  355       assert(!CMS_flag_is_set(CMS_cms_has_token | CMS_cms_wants_token),
> >  356              "Should not be set");
> >  357     }
> >  358
> >  359     // Extra wait time check before entering the heap lock to get
> > the collection count
> >  360     if (os::elapsedTime() >= end_time) {
> >  361       // Wait time is over
> >  362       break;
> >  363     }
> > 
> > Modify above wait time check to make an exception for t_miliis == 0:
> >            // Extra wait time check before checking collection count
> >            if (t_millis != 0 && os::elapsedTime() >= end_time) {
> >               // wait time exceeded
> >               break;
> >            }
> > 
> >  364
> >  365     // Total collections count after the event
> >  366     unsigned int after_count;
> >  367     {
> >  368       MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
> >  369       after_count = gch->total_collections();
> >  370     }
> >  371
> >  372     if (before_count != after_count) {
> >  373       // There was a collection - success
> >  374       break;
> >  375     }
> >  376   }
> >  377 }
> > 
> > While it is true that we do not have a case where the method is called
> > with a time of "0", I think we
> > want that value to be treated correctly as "infinity". For the case
> > where we do not want a wait at all,
> > we should use a small positive value, like "1 ms" to signal that
> > intent, i.e. -XX:CMSWaitDuration=1,
> > reserving CMSWaitDuration=0 to signal infinity. (We could also do that
> > by reserving negative values to
> > signal infinity, but that would make the code in the loop a bit more fiddly.)
> > 
> > As mentioned in my previous email, I'd like to see this tested with
> > CMSWaitDuration set to 0, positive and
> > negative values (if necessary, we can reject negative value settings),
> > and with ExplicitGCInvokesConcurrent.
> > 
> > Rest looks OK to me, although I am not sure how this behaves with
> > iCMS, as I have forgotten that part of the
> > code.
> > 
> > Finally, in current code (before these changes) there are two callers
> > of the former wait_for_cms_lock() method,
> > one here in sleepBeforeNextCycle() and one from the precleaning loop.
> > I think the right thing has been done
> > in terms of leaving the latter alone.
> > 
> > It would be good if this were checked with CMSInitiatingOccupancy set
> > to 0 (or a small value), CMSWaitDuration set to 0,
> > -+PromotionFailureALot and checking that (1) it does not deadlock (2)
> > CMS cycles start very soon after the end of
> > a scavenge (and not at random times as Michal has observed earlier,
> > although i am guessing that is difficult to test).
> > It would be good to repeat the above test with iCMS as well.
> > 
> > thanks!
> > -- ramki
> > 
> > On Thu, Dec 6, 2012 at 1:39 PM, Srinivas Ramakrishna  wrote:
> > >
> > > Thanks Jon for the pointer:
> > >
> > >
> > > On Thu, Dec 6, 2012 at 1:06 PM, Jon Masamitsu  wrote:
> > >>
> > >>
> > >>
> > >> On 12/05/12 14:47, Srinivas Ramakrishna wrote:
> > >>>
> > >>> The high level idea looks correct. I'll look at the details in a bit (seriously this time; sorry it dropped off my plate last time I promised).
> > >>> Does anyone have a pointer to the related discussion thread on this aias from earlier in the year, by chance, so one could refresh one's
> > >>> memory of that discussion?
> > >>
> > >>
> > >> subj: CMSWaitDuration unstable behavior
> > >>
> > >> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/thread.html
> > >>
> > >>
> > >
> > > also: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/004880.html
> > >
> > > On to it later this afternoon, and TTYL w/review.
> > > - ramki
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: openjdk7u-7189971.patch
Type: application/octet-stream
Size: 6454 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20121212/3b09bc69/openjdk7u-7189971.patch>