RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS

Thu Dec 13 23:33:39 UTC 2012

Hi Michal,

Thanks for the revised patch. I'll apply it, generate a revised webrev, 
and send out a follow-up request.

JohnC

On 12/12/12 04:35, Michal Frajt wrote:
> All, 
>  
> Find the attached patch. It implements proposed recommendations and requested changes. Please mind that the CMSWaitDuration set to -1 (never wait) requires new parameter CMSCheckInterval (develop only, 1000 milliseconds default - constant).  The parameter defines the next CMS cycle start check interval in the case there are no desynchronization (notifications) events on the CGC_lock.
>
> Tested with the Solaris/amd64 build
>  
> CMS 
> + CMSWaitDuration>0 OK
> + CMSWaitDuration=0 OK
> + CMSWaitDuration<0 OK
>  
> iCMS 
> + CMSWaitDuration>0 OK
> + CMSWaitDuration=0 OK
> + CMSWaitDuration<0 OK
>  
> Regards,
> Michal
>  
>  
> Od: hotspot-gc-dev-bounces at openjdk.java.net
> Komu: hotspot-gc-dev at openjdk.java.net
> Kopie: 
> Datum: Fri,  7 Dec 2012 18:48:48 +0100
> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental  mode of CMS
>
>   
>> Hi John/Jon/Ramki,
>>
>> All proposed recommendations and requested changes have been implemented. We are going to test it on Monday. You will get the new tested patch soon. 
>>
>> The attached code here just got compiled, no test executed yet, it might contain a bug, but you can quickly review it and send your comments.
>>
>> Best regards
>> Michal
>>
>>
>> // Wait until the next synchronous GC, a concurrent full gc request,
>> // or a timeout, whichever is earlier.
>> void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long t_millis) {
>>   // Wait time in millis or 0 value representing infinite wait for a scavenge
>>   assert(t_millis >= 0, "Wait time for scavenge should be 0 or positive");
>>
>>   GenCollectedHeap* gch = GenCollectedHeap::heap();
>>   double start_time_secs = os::elapsedTime();
>>   double end_time_secs = start_time_secs + (t_millis / ((double) MILLIUNITS));
>>
>>   // Total collections count before waiting loop
>>   unsigned int before_count;
>>   {
>>     MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>     before_count = gch->total_collections();
>>   }
>>
>>   unsigned int loop_count = 0;
>>
>>   while(!_should_terminate) {
>>     double now_time = os::elapsedTime();
>>     long wait_time_millis;
>>
>>     if(t_millis != 0) {
>>       // New wait limit
>>       wait_time_millis = (long) ((end_time_secs - now_time) * MILLIUNITS);
>>       if(wait_time_millis <= 0) {
>>         // Wait time is over
>>         break;
>>       }
>>     } else {
>>       // No wait limit, wait if necessary forever
>>       wait_time_millis = 0;
>>     }
>>
>>     // Wait until the next event or the remaining timeout
>>     {
>>       MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
>>
>>       set_CMS_flag(CMS_cms_wants_token);   // to provoke notifies
>>       if (_should_terminate || _collector->_full_gc_requested) {
>>         return;
>>       }
>>       assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
>>       CGC_lock->wait(Mutex::_no_safepoint_check_flag, wait_time_millis);
>>       clear_CMS_flag(CMS_cms_wants_token);
>>       assert(!CMS_flag_is_set(CMS_cms_has_token | CMS_cms_wants_token),
>>              "Should not be set");
>>     }
>>
>>     // Extra wait time check before entering the heap lock to get the collection count
>>     if(t_millis != 0 && os::elapsedTime() >= end_time_secs) {
>>       // Wait time is over
>>       break;
>>     }
>>
>>     // Total collections count after the event
>>     unsigned int after_count;
>>     {
>>       MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>       after_count = gch->total_collections();
>>     }
>>
>>     if(before_count != after_count) {
>>       // There was a collection - success
>>       break;
>>     }
>>
>>     // Too many loops warning
>>     if(++loop_count == 0) {
>>       warning("wait_on_cms_lock_for_scavenge() has looped %d times", loop_count - 1);
>>     }
>>   }
>> }
>>
>> void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
>>   while (!_should_terminate) {
>>     if (CMSIncrementalMode) {
>>       icms_wait();
>>       if(CMSWaitDuration >= 0) {
>>         // Wait until the next synchronous GC, a concurrent full gc
>>         // request or a timeout, whichever is earlier.
>>         wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>       }
>>       return;
>>     } else {
>>       if(CMSWaitDuration >= 0) {
>>         // Wait until the next synchronous GC, a concurrent full gc
>>         // request or a timeout, whichever is earlier.
>>         wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>       } else {
>>         // Wait until any cms_lock event not to call shouldConcurrentCollect permanently
>>         wait_on_cms_lock(0);
>>       }
>>     }
>>     // Check if we should start a CMS collection cycle
>>     if (_collector->shouldConcurrentCollect()) {
>>       return;
>>     }
>>     // .. collection criterion not yet met, let's go back
>>     // and wait some more
>>   }
>> }
>>  
>>
>>  
>> Od: hotspot-gc-dev-bounces at openjdk.java.net
>> Komu: "Jon Masamitsu" jon.masamitsu at oracle.com,"John Cuthbertson" john.cuthbertson at oracle.com
>> Kopie: hotspot-gc-dev at openjdk.java.net
>> Datum: Thu, 6 Dec 2012 23:43:29 -0800
>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS
>>
>>     
>>> Hi John --
>>>
>>> wrt the changes posted, i see the intent of the code and agree with
>>> it. I have a few minor suggestions on the
>>> details of how it's implemented. My comments are inline below,
>>> interleaved with the code:
>>>
>>>  317 // Wait until the next synchronous GC, a concurrent full gc request,
>>>  318 // or a timeout, whichever is earlier.
>>>  319 void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
>>> t_millis) {
>>>  320   // Wait for any cms_lock event when timeout not specified (0 millis)
>>>  321   if (t_millis == 0) {
>>>  322     wait_on_cms_lock(t_millis);
>>>  323     return;
>>>  324   }
>>>
>>> I'd completely avoid the special case above because it would miss the
>>> part about waiting for a
>>> scavenge, instead dealing with that case in the code in the loop below
>>> directly. The idea
>>> of the "0" value is not to ask that we return immediately, but that we
>>> wait, if necessary
>>> forever, for a scavenge. The "0" really represents the value infinity
>>> in that sense. This would
>>> be in keeping with our use of wait() with a "0" value for timeout at
>>> other places in the JVM as
>>> well, so it's consistent.
>>>
>>>  325
>>>  326   GenCollectedHeap* gch = GenCollectedHeap::heap();
>>>  327   double start_time = os::elapsedTime();
>>>  328   double end_time = start_time + (t_millis / 1000.0);
>>>
>>> Note how, the end_time == start_time for the special case of t_millis
>>> == 0, so we need to treat that
>>> case specially below.
>>>
>>>  329
>>>  330   // Total collections count before waiting loop
>>>  331   unsigned int before_count;
>>>  332   {
>>>  333     MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>>  334     before_count = gch->total_collections();
>>>  335   }
>>>
>>> Good.
>>>
>>>  336
>>>  337   while (true) {
>>>  338     double now_time = os::elapsedTime();
>>>  339     long wait_time_millis = (long)((end_time - now_time) * 1000.0);
>>>  340
>>>  341     if (wait_time_millis <= 0) {
>>>  342       // Wait time is over
>>>  343       break;
>>>  344     }
>>>
>>> Modify to:
>>>            if (t_millis != 0) {
>>>               if  (wait_time_millis <= 0)  {
>>>                  // Wait time is over
>>>                  break;
>>>              }
>>>           } else {
>>>              wait_time_millis = 0;     // for use in wait() below
>>>          }
>>>
>>>  345
>>>  346     // Wait until the next event or the remaining timeout
>>>  347     {
>>>  348       MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
>>>  349       if (_should_terminate || _collector->_full_gc_requested) {
>>>  350         return;
>>>  351       }
>>>  352       set_CMS_flag(CMS_cms_wants_token);   // to provoke notifies
>>>
>>> insert:     assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
>>>
>>>  353       CGC_lock->wait(Mutex::_no_safepoint_check_flag, wait_time_millis);
>>>  354       clear_CMS_flag(CMS_cms_wants_token);
>>>  355       assert(!CMS_flag_is_set(CMS_cms_has_token | CMS_cms_wants_token),
>>>  356              "Should not be set");
>>>  357     }
>>>  358
>>>  359     // Extra wait time check before entering the heap lock to get
>>> the collection count
>>>  360     if (os::elapsedTime() >= end_time) {
>>>  361       // Wait time is over
>>>  362       break;
>>>  363     }
>>>
>>> Modify above wait time check to make an exception for t_miliis == 0:
>>>            // Extra wait time check before checking collection count
>>>            if (t_millis != 0 && os::elapsedTime() >= end_time) {
>>>               // wait time exceeded
>>>               break;
>>>            }
>>>
>>>  364
>>>  365     // Total collections count after the event
>>>  366     unsigned int after_count;
>>>  367     {
>>>  368       MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>>  369       after_count = gch->total_collections();
>>>  370     }
>>>  371
>>>  372     if (before_count != after_count) {
>>>  373       // There was a collection - success
>>>  374       break;
>>>  375     }
>>>  376   }
>>>  377 }
>>>
>>> While it is true that we do not have a case where the method is called
>>> with a time of "0", I think we
>>> want that value to be treated correctly as "infinity". For the case
>>> where we do not want a wait at all,
>>> we should use a small positive value, like "1 ms" to signal that
>>> intent, i.e. -XX:CMSWaitDuration=1,
>>> reserving CMSWaitDuration=0 to signal infinity. (We could also do that
>>> by reserving negative values to
>>> signal infinity, but that would make the code in the loop a bit more fiddly.)
>>>
>>> As mentioned in my previous email, I'd like to see this tested with
>>> CMSWaitDuration set to 0, positive and
>>> negative values (if necessary, we can reject negative value settings),
>>> and with ExplicitGCInvokesConcurrent.
>>>
>>> Rest looks OK to me, although I am not sure how this behaves with
>>> iCMS, as I have forgotten that part of the
>>> code.
>>>
>>> Finally, in current code (before these changes) there are two callers
>>> of the former wait_for_cms_lock() method,
>>> one here in sleepBeforeNextCycle() and one from the precleaning loop.
>>> I think the right thing has been done
>>> in terms of leaving the latter alone.
>>>
>>> It would be good if this were checked with CMSInitiatingOccupancy set
>>> to 0 (or a small value), CMSWaitDuration set to 0,
>>> -+PromotionFailureALot and checking that (1) it does not deadlock (2)
>>> CMS cycles start very soon after the end of
>>> a scavenge (and not at random times as Michal has observed earlier,
>>> although i am guessing that is difficult to test).
>>> It would be good to repeat the above test with iCMS as well.
>>>
>>> thanks!
>>> -- ramki
>>>
>>> On Thu, Dec 6, 2012 at 1:39 PM, Srinivas Ramakrishna  wrote:
>>>       
>>>> Thanks Jon for the pointer:
>>>>
>>>>
>>>> On Thu, Dec 6, 2012 at 1:06 PM, Jon Masamitsu  wrote:
>>>>         
>>>>>
>>>>> On 12/05/12 14:47, Srinivas Ramakrishna wrote:
>>>>>           
>>>>>> The high level idea looks correct. I'll look at the details in a bit (seriously this time; sorry it dropped off my plate last time I promised).
>>>>>> Does anyone have a pointer to the related discussion thread on this aias from earlier in the year, by chance, so one could refresh one's
>>>>>> memory of that discussion?
>>>>>>             
>>>>> subj: CMSWaitDuration unstable behavior
>>>>>
>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/thread.html
>>>>>
>>>>>
>>>>>           
>>>> also: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/004880.html
>>>>
>>>> On to it later this afternoon, and TTYL w/review.
>>>> - ramki
>>>>