RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS

Thu Jan 10 20:20:28 UTC 2013

Hi Michal,

On 1/10/2013 11:46 AM, John Cuthbertson wrote:
> Hi Michal,
>
> Many apologies for the delay in generating a new webrev for this 
> change but here is the new one: 
> http://cr.openjdk.java.net/~johnc/7189971/webrev.1/
>
> Can you verify the webrev to make sure that changes have been applied 
> correctly? Looking at the new webrev it seems that the setting of the 
> CMS has been moved back above the return out of the loop. Was this 
> intentional?

The above should be "... setting of the CMS token has been ...".

JohnC

>
> I've done a couple of sanity tests with GCOld with CMSWaitDuration=0 
> and CMSWaitDuration=1500 with CMS.
>
> Regards,
>
> JohnC
>
> On 12/12/2012 4:35 AM, Michal Frajt wrote:
>> All,
>>   Find the attached patch. It implements proposed recommendations and 
>> requested changes. Please mind that the CMSWaitDuration set to -1 
>> (never wait) requires new parameter CMSCheckInterval (develop only, 
>> 1000 milliseconds default - constant).  The parameter defines the 
>> next CMS cycle start check interval in the case there are no 
>> desynchronization (notifications) events on the CGC_lock.
>>
>> Tested with the Solaris/amd64 build
>>   CMS
>> + CMSWaitDuration>0 OK
>> + CMSWaitDuration=0 OK
>> + CMSWaitDuration<0 OK
>>   iCMS
>> + CMSWaitDuration>0 OK
>> + CMSWaitDuration=0 OK
>> + CMSWaitDuration<0 OK
>>   Regards,
>> Michal
>>     Od: hotspot-gc-dev-bounces at openjdk.java.net
>> Komu: hotspot-gc-dev at openjdk.java.net
>> Kopie:
>> Datum: Fri,  7 Dec 2012 18:48:48 +0100
>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for 
>> non-incremental  mode of CMS
>>
>>> Hi John/Jon/Ramki,
>>>
>>> All proposed recommendations and requested changes have been 
>>> implemented. We are going to test it on Monday. You will get the new 
>>> tested patch soon.
>>>
>>> The attached code here just got compiled, no test executed yet, it 
>>> might contain a bug, but you can quickly review it and send your 
>>> comments.
>>>
>>> Best regards
>>> Michal
>>>
>>>
>>> // Wait until the next synchronous GC, a concurrent full gc request,
>>> // or a timeout, whichever is earlier.
>>> void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long 
>>> t_millis) {
>>>    // Wait time in millis or 0 value representing infinite wait for 
>>> a scavenge
>>>    assert(t_millis >= 0, "Wait time for scavenge should be 0 or 
>>> positive");
>>>
>>>    GenCollectedHeap* gch = GenCollectedHeap::heap();
>>>    double start_time_secs = os::elapsedTime();
>>>    double end_time_secs = start_time_secs + (t_millis / ((double) 
>>> MILLIUNITS));
>>>
>>>    // Total collections count before waiting loop
>>>    unsigned int before_count;
>>>    {
>>>      MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>>      before_count = gch->total_collections();
>>>    }
>>>
>>>    unsigned int loop_count = 0;
>>>
>>>    while(!_should_terminate) {
>>>      double now_time = os::elapsedTime();
>>>      long wait_time_millis;
>>>
>>>      if(t_millis != 0) {
>>>        // New wait limit
>>>        wait_time_millis = (long) ((end_time_secs - now_time) * 
>>> MILLIUNITS);
>>>        if(wait_time_millis <= 0) {
>>>          // Wait time is over
>>>          break;
>>>        }
>>>      } else {
>>>        // No wait limit, wait if necessary forever
>>>        wait_time_millis = 0;
>>>      }
>>>
>>>      // Wait until the next event or the remaining timeout
>>>      {
>>>        MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
>>>
>>>        set_CMS_flag(CMS_cms_wants_token);   // to provoke notifies
>>>        if (_should_terminate || _collector->_full_gc_requested) {
>>>          return;
>>>        }
>>>        assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
>>>        CGC_lock->wait(Mutex::_no_safepoint_check_flag, 
>>> wait_time_millis);
>>>        clear_CMS_flag(CMS_cms_wants_token);
>>>        assert(!CMS_flag_is_set(CMS_cms_has_token | 
>>> CMS_cms_wants_token),
>>>               "Should not be set");
>>>      }
>>>
>>>      // Extra wait time check before entering the heap lock to get 
>>> the collection count
>>>      if(t_millis != 0 && os::elapsedTime() >= end_time_secs) {
>>>        // Wait time is over
>>>        break;
>>>      }
>>>
>>>      // Total collections count after the event
>>>      unsigned int after_count;
>>>      {
>>>        MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>>        after_count = gch->total_collections();
>>>      }
>>>
>>>      if(before_count != after_count) {
>>>        // There was a collection - success
>>>        break;
>>>      }
>>>
>>>      // Too many loops warning
>>>      if(++loop_count == 0) {
>>>        warning("wait_on_cms_lock_for_scavenge() has looped %d 
>>> times", loop_count - 1);
>>>      }
>>>    }
>>> }
>>>
>>> void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
>>>    while (!_should_terminate) {
>>>      if (CMSIncrementalMode) {
>>>        icms_wait();
>>>        if(CMSWaitDuration >= 0) {
>>>          // Wait until the next synchronous GC, a concurrent full gc
>>>          // request or a timeout, whichever is earlier.
>>>          wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>>        }
>>>        return;
>>>      } else {
>>>        if(CMSWaitDuration >= 0) {
>>>          // Wait until the next synchronous GC, a concurrent full gc
>>>          // request or a timeout, whichever is earlier.
>>>          wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>>        } else {
>>>          // Wait until any cms_lock event not to call 
>>> shouldConcurrentCollect permanently
>>>          wait_on_cms_lock(0);
>>>        }
>>>      }
>>>      // Check if we should start a CMS collection cycle
>>>      if (_collector->shouldConcurrentCollect()) {
>>>        return;
>>>      }
>>>      // .. collection criterion not yet met, let's go back
>>>      // and wait some more
>>>    }
>>> }
>>>
>>>   Od: hotspot-gc-dev-bounces at openjdk.java.net
>>> Komu: "Jon Masamitsu" jon.masamitsu at oracle.com,"John Cuthbertson" 
>>> john.cuthbertson at oracle.com
>>> Kopie: hotspot-gc-dev at openjdk.java.net
>>> Datum: Thu, 6 Dec 2012 23:43:29 -0800
>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for 
>>> non-incremental mode of CMS
>>>
>>>> Hi John --
>>>>
>>>> wrt the changes posted, i see the intent of the code and agree with
>>>> it. I have a few minor suggestions on the
>>>> details of how it's implemented. My comments are inline below,
>>>> interleaved with the code:
>>>>
>>>>   317 // Wait until the next synchronous GC, a concurrent full gc 
>>>> request,
>>>>   318 // or a timeout, whichever is earlier.
>>>>   319 void 
>>>> ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
>>>> t_millis) {
>>>>   320   // Wait for any cms_lock event when timeout not specified 
>>>> (0 millis)
>>>>   321   if (t_millis == 0) {
>>>>   322     wait_on_cms_lock(t_millis);
>>>>   323     return;
>>>>   324   }
>>>>
>>>> I'd completely avoid the special case above because it would miss the
>>>> part about waiting for a
>>>> scavenge, instead dealing with that case in the code in the loop below
>>>> directly. The idea
>>>> of the "0" value is not to ask that we return immediately, but that we
>>>> wait, if necessary
>>>> forever, for a scavenge. The "0" really represents the value infinity
>>>> in that sense. This would
>>>> be in keeping with our use of wait() with a "0" value for timeout at
>>>> other places in the JVM as
>>>> well, so it's consistent.
>>>>
>>>>   325
>>>>   326   GenCollectedHeap* gch = GenCollectedHeap::heap();
>>>>   327   double start_time = os::elapsedTime();
>>>>   328   double end_time = start_time + (t_millis / 1000.0);
>>>>
>>>> Note how, the end_time == start_time for the special case of t_millis
>>>> == 0, so we need to treat that
>>>> case specially below.
>>>>
>>>>   329
>>>>   330   // Total collections count before waiting loop
>>>>   331   unsigned int before_count;
>>>>   332   {
>>>>   333     MutexLockerEx hl(Heap_lock, 
>>>> Mutex::_no_safepoint_check_flag);
>>>>   334     before_count = gch->total_collections();
>>>>   335   }
>>>>
>>>> Good.
>>>>
>>>>   336
>>>>   337   while (true) {
>>>>   338     double now_time = os::elapsedTime();
>>>>   339     long wait_time_millis = (long)((end_time - now_time) * 
>>>> 1000.0);
>>>>   340
>>>>   341     if (wait_time_millis <= 0) {
>>>>   342       // Wait time is over
>>>>   343       break;
>>>>   344     }
>>>>
>>>> Modify to:
>>>>             if (t_millis != 0) {
>>>>                if  (wait_time_millis <= 0)  {
>>>>                   // Wait time is over
>>>>                   break;
>>>>               }
>>>>            } else {
>>>>               wait_time_millis = 0;     // for use in wait() below
>>>>           }
>>>>
>>>>   345
>>>>   346     // Wait until the next event or the remaining timeout
>>>>   347     {
>>>>   348       MutexLockerEx x(CGC_lock, 
>>>> Mutex::_no_safepoint_check_flag);
>>>>   349       if (_should_terminate || _collector->_full_gc_requested) {
>>>>   350         return;
>>>>   351       }
>>>>   352       set_CMS_flag(CMS_cms_wants_token);   // to provoke 
>>>> notifies
>>>>
>>>> insert:     assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
>>>>
>>>>   353 CGC_lock->wait(Mutex::_no_safepoint_check_flag, 
>>>> wait_time_millis);
>>>>   354       clear_CMS_flag(CMS_cms_wants_token);
>>>>   355       assert(!CMS_flag_is_set(CMS_cms_has_token | 
>>>> CMS_cms_wants_token),
>>>>   356              "Should not be set");
>>>>   357     }
>>>>   358
>>>>   359     // Extra wait time check before entering the heap lock to 
>>>> get
>>>> the collection count
>>>>   360     if (os::elapsedTime() >= end_time) {
>>>>   361       // Wait time is over
>>>>   362       break;
>>>>   363     }
>>>>
>>>> Modify above wait time check to make an exception for t_miliis == 0:
>>>>             // Extra wait time check before checking collection count
>>>>             if (t_millis != 0 && os::elapsedTime() >= end_time) {
>>>>                // wait time exceeded
>>>>                break;
>>>>             }
>>>>
>>>>   364
>>>>   365     // Total collections count after the event
>>>>   366     unsigned int after_count;
>>>>   367     {
>>>>   368       MutexLockerEx hl(Heap_lock, 
>>>> Mutex::_no_safepoint_check_flag);
>>>>   369       after_count = gch->total_collections();
>>>>   370     }
>>>>   371
>>>>   372     if (before_count != after_count) {
>>>>   373       // There was a collection - success
>>>>   374       break;
>>>>   375     }
>>>>   376   }
>>>>   377 }
>>>>
>>>> While it is true that we do not have a case where the method is called
>>>> with a time of "0", I think we
>>>> want that value to be treated correctly as "infinity". For the case
>>>> where we do not want a wait at all,
>>>> we should use a small positive value, like "1 ms" to signal that
>>>> intent, i.e. -XX:CMSWaitDuration=1,
>>>> reserving CMSWaitDuration=0 to signal infinity. (We could also do that
>>>> by reserving negative values to
>>>> signal infinity, but that would make the code in the loop a bit 
>>>> more fiddly.)
>>>>
>>>> As mentioned in my previous email, I'd like to see this tested with
>>>> CMSWaitDuration set to 0, positive and
>>>> negative values (if necessary, we can reject negative value settings),
>>>> and with ExplicitGCInvokesConcurrent.
>>>>
>>>> Rest looks OK to me, although I am not sure how this behaves with
>>>> iCMS, as I have forgotten that part of the
>>>> code.
>>>>
>>>> Finally, in current code (before these changes) there are two callers
>>>> of the former wait_for_cms_lock() method,
>>>> one here in sleepBeforeNextCycle() and one from the precleaning loop.
>>>> I think the right thing has been done
>>>> in terms of leaving the latter alone.
>>>>
>>>> It would be good if this were checked with CMSInitiatingOccupancy set
>>>> to 0 (or a small value), CMSWaitDuration set to 0,
>>>> -+PromotionFailureALot and checking that (1) it does not deadlock (2)
>>>> CMS cycles start very soon after the end of
>>>> a scavenge (and not at random times as Michal has observed earlier,
>>>> although i am guessing that is difficult to test).
>>>> It would be good to repeat the above test with iCMS as well.
>>>>
>>>> thanks!
>>>> -- ramki
>>>>
>>>> On Thu, Dec 6, 2012 at 1:39 PM, Srinivas Ramakrishna  wrote:
>>>>> Thanks Jon for the pointer:
>>>>>
>>>>>
>>>>> On Thu, Dec 6, 2012 at 1:06 PM, Jon Masamitsu  wrote:
>>>>>>
>>>>>>
>>>>>> On 12/05/12 14:47, Srinivas Ramakrishna wrote:
>>>>>>> The high level idea looks correct. I'll look at the details in a 
>>>>>>> bit (seriously this time; sorry it dropped off my plate last 
>>>>>>> time I promised).
>>>>>>> Does anyone have a pointer to the related discussion thread on 
>>>>>>> this aias from earlier in the year, by chance, so one could 
>>>>>>> refresh one's
>>>>>>> memory of that discussion?
>>>>>>
>>>>>> subj: CMSWaitDuration unstable behavior
>>>>>>
>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/thread.html 
>>>>>>
>>>>>>
>>>>>>
>>>>> also: 
>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/004880.html
>>>>>
>>>>> On to it later this afternoon, and TTYL w/review.
>>>>> - ramki
>