RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS

John Cuthbertson john.cuthbertson at oracle.com
Fri Jan 25 21:13:27 UTC 2013


Hi Michal,

The patch is applied. The new webrev can be found at 
http://cr.openjdk.java.net/~johnc/7189971/webrev.2/

Thanks,

JohnC

On 1/17/2013 2:58 AM, Michal Frajt wrote:
> Hi John,
>
> Please apply the attached patch to the webrev. You are right, the setting of the CMS token has been somehow moved back above the method return. Additionally I have fixed the printf of the unsigned loop counter (correct is %u).
>
> Regards,
> Michal
>   
> Od: hotspot-gc-dev-bounces at openjdk.java.net
> Komu: hotspot-gc-dev at openjdk.java.net
> Kopie:
> Datum: Thu, 10 Jan 2013 12:20:28 -0800
> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS
>
>
>> Hi Michal,
>>
>> On 1/10/2013 11:46 AM, John Cuthbertson wrote:
>>> Hi Michal,
>>>
>>> Many apologies for the delay in generating a new webrev for this
>>> change but here is the new one:
>>> http://cr.openjdk.java.net/~johnc/7189971/webrev.1/
>>>
>>> Can you verify the webrev to make sure that changes have been applied
>>> correctly? Looking at the new webrev it seems that the setting of the
>>> CMS has been moved back above the return out of the loop. Was this
>>> intentional?
>> The above should be "... setting of the CMS token has been ...".
>>
>> JohnC
>>
>>> I've done a couple of sanity tests with GCOld with CMSWaitDuration=0
>>> and CMSWaitDuration=1500 with CMS.
>>>
>>> Regards,
>>>
>>> JohnC
>>>
>>> On 12/12/2012 4:35 AM, Michal Frajt wrote:
>>>> All,
>>>> Find the attached patch. It implements proposed recommendations and
>>>> requested changes. Please mind that the CMSWaitDuration set to -1
>>>> (never wait) requires new parameter CMSCheckInterval (develop only,
>>>> 1000 milliseconds default - constant). The parameter defines the
>>>> next CMS cycle start check interval in the case there are no
>>>> desynchronization (notifications) events on the CGC_lock.
>>>>
>>>> Tested with the Solaris/amd64 build
>>>> CMS
>>>> + CMSWaitDuration>0 OK
>>>> + CMSWaitDuration=0 OK
>>>> + CMSWaitDuration<0 OK
>>>> iCMS
>>>> + CMSWaitDuration>0 OK
>>>> + CMSWaitDuration=0 OK
>>>> + CMSWaitDuration<0 OK
>>>> Regards,
>>>> Michal
>>>> Od: hotspot-gc-dev-bounces at openjdk.java.net
>>>> Komu: hotspot-gc-dev at openjdk.java.net
>>>> Kopie:
>>>> Datum: Fri, 7 Dec 2012 18:48:48 +0100
>>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for
>>>> non-incremental mode of CMS
>>>>
>>>>> Hi John/Jon/Ramki,
>>>>>
>>>>> All proposed recommendations and requested changes have been
>>>>> implemented. We are going to test it on Monday. You will get the new
>>>>> tested patch soon.
>>>>>
>>>>> The attached code here just got compiled, no test executed yet, it
>>>>> might contain a bug, but you can quickly review it and send your
>>>>> comments.
>>>>>
>>>>> Best regards
>>>>> Michal
>>>>>
>>>>>
>>>>> // Wait until the next synchronous GC, a concurrent full gc request,
>>>>> // or a timeout, whichever is earlier.
>>>>> void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
>>>>> t_millis) {
>>>>> // Wait time in millis or 0 value representing infinite wait for
>>>>> a scavenge
>>>>> assert(t_millis >= 0, "Wait time for scavenge should be 0 or
>>>>> positive");
>>>>>
>>>>> GenCollectedHeap* gch = GenCollectedHeap::heap();
>>>>> double start_time_secs = os::elapsedTime();
>>>>> double end_time_secs = start_time_secs + (t_millis / ((double)
>>>>> MILLIUNITS));
>>>>>
>>>>> // Total collections count before waiting loop
>>>>> unsigned int before_count;
>>>>> {
>>>>> MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>>>> before_count = gch->total_collections();
>>>>> }
>>>>>
>>>>> unsigned int loop_count = 0;
>>>>>
>>>>> while(!_should_terminate) {
>>>>> double now_time = os::elapsedTime();
>>>>> long wait_time_millis;
>>>>>
>>>>> if(t_millis != 0) {
>>>>> // New wait limit
>>>>> wait_time_millis = (long) ((end_time_secs - now_time) *
>>>>> MILLIUNITS);
>>>>> if(wait_time_millis <= 0) {
>>>>> // Wait time is over
>>>>> break;
>>>>> }
>>>>> } else {
>>>>> // No wait limit, wait if necessary forever
>>>>> wait_time_millis = 0;
>>>>> }
>>>>>
>>>>> // Wait until the next event or the remaining timeout
>>>>> {
>>>>> MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
>>>>>
>>>>> set_CMS_flag(CMS_cms_wants_token); // to provoke notifies
>>>>> if (_should_terminate || _collector->_full_gc_requested) {
>>>>> return;
>>>>> }
>>>>> assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
>>>>> CGC_lock->wait(Mutex::_no_safepoint_check_flag,
>>>>> wait_time_millis);
>>>>> clear_CMS_flag(CMS_cms_wants_token);
>>>>> assert(!CMS_flag_is_set(CMS_cms_has_token |
>>>>> CMS_cms_wants_token),
>>>>> "Should not be set");
>>>>> }
>>>>>
>>>>> // Extra wait time check before entering the heap lock to get
>>>>> the collection count
>>>>> if(t_millis != 0 && os::elapsedTime() >= end_time_secs) {
>>>>> // Wait time is over
>>>>> break;
>>>>> }
>>>>>
>>>>> // Total collections count after the event
>>>>> unsigned int after_count;
>>>>> {
>>>>> MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>>>> after_count = gch->total_collections();
>>>>> }
>>>>>
>>>>> if(before_count != after_count) {
>>>>> // There was a collection - success
>>>>> break;
>>>>> }
>>>>>
>>>>> // Too many loops warning
>>>>> if(++loop_count == 0) {
>>>>> warning("wait_on_cms_lock_for_scavenge() has looped %d
>>>>> times", loop_count - 1);
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
>>>>> while (!_should_terminate) {
>>>>> if (CMSIncrementalMode) {
>>>>> icms_wait();
>>>>> if(CMSWaitDuration >= 0) {
>>>>> // Wait until the next synchronous GC, a concurrent full gc
>>>>> // request or a timeout, whichever is earlier.
>>>>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>>>> }
>>>>> return;
>>>>> } else {
>>>>> if(CMSWaitDuration >= 0) {
>>>>> // Wait until the next synchronous GC, a concurrent full gc
>>>>> // request or a timeout, whichever is earlier.
>>>>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>>>> } else {
>>>>> // Wait until any cms_lock event not to call
>>>>> shouldConcurrentCollect permanently
>>>>> wait_on_cms_lock(0);
>>>>> }
>>>>> }
>>>>> // Check if we should start a CMS collection cycle
>>>>> if (_collector->shouldConcurrentCollect()) {
>>>>> return;
>>>>> }
>>>>> // .. collection criterion not yet met, let's go back
>>>>> // and wait some more
>>>>> }
>>>>> }
>>>>>
>>>>> Od: hotspot-gc-dev-bounces at openjdk.java.net
>>>>> Komu: "Jon Masamitsu" jon.masamitsu at oracle.com,"John Cuthbertson"
>>>>> john.cuthbertson at oracle.com
>>>>> Kopie: hotspot-gc-dev at openjdk.java.net
>>>>> Datum: Thu, 6 Dec 2012 23:43:29 -0800
>>>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for
>>>>> non-incremental mode of CMS
>>>>>
>>>>>> Hi John --
>>>>>>
>>>>>> wrt the changes posted, i see the intent of the code and agree with
>>>>>> it. I have a few minor suggestions on the
>>>>>> details of how it's implemented. My comments are inline below,
>>>>>> interleaved with the code:
>>>>>>
>>>>>> 317 // Wait until the next synchronous GC, a concurrent full gc
>>>>>> request,
>>>>>> 318 // or a timeout, whichever is earlier.
>>>>>> 319 void
>>>>>> ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
>>>>>> t_millis) {
>>>>>> 320 // Wait for any cms_lock event when timeout not specified
>>>>>> (0 millis)
>>>>>> 321 if (t_millis == 0) {
>>>>>> 322 wait_on_cms_lock(t_millis);
>>>>>> 323 return;
>>>>>> 324 }
>>>>>>
>>>>>> I'd completely avoid the special case above because it would miss the
>>>>>> part about waiting for a
>>>>>> scavenge, instead dealing with that case in the code in the loop below
>>>>>> directly. The idea
>>>>>> of the "0" value is not to ask that we return immediately, but that we
>>>>>> wait, if necessary
>>>>>> forever, for a scavenge. The "0" really represents the value infinity
>>>>>> in that sense. This would
>>>>>> be in keeping with our use of wait() with a "0" value for timeout at
>>>>>> other places in the JVM as
>>>>>> well, so it's consistent.
>>>>>>
>>>>>> 325
>>>>>> 326 GenCollectedHeap* gch = GenCollectedHeap::heap();
>>>>>> 327 double start_time = os::elapsedTime();
>>>>>> 328 double end_time = start_time + (t_millis / 1000.0);
>>>>>>
>>>>>> Note how, the end_time == start_time for the special case of t_millis
>>>>>> == 0, so we need to treat that
>>>>>> case specially below.
>>>>>>
>>>>>> 329
>>>>>> 330 // Total collections count before waiting loop
>>>>>> 331 unsigned int before_count;
>>>>>> 332 {
>>>>>> 333 MutexLockerEx hl(Heap_lock,
>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>> 334 before_count = gch->total_collections();
>>>>>> 335 }
>>>>>>
>>>>>> Good.
>>>>>>
>>>>>> 336
>>>>>> 337 while (true) {
>>>>>> 338 double now_time = os::elapsedTime();
>>>>>> 339 long wait_time_millis = (long)((end_time - now_time) *
>>>>>> 1000.0);
>>>>>> 340
>>>>>> 341 if (wait_time_millis <= 0) {
>>>>>> 342 // Wait time is over
>>>>>> 343 break;
>>>>>> 344 }
>>>>>>
>>>>>> Modify to:
>>>>>> if (t_millis != 0) {
>>>>>> if (wait_time_millis <= 0) {
>>>>>> // Wait time is over
>>>>>> break;
>>>>>> }
>>>>>> } else {
>>>>>> wait_time_millis = 0; // for use in wait() below
>>>>>> }
>>>>>>
>>>>>> 345
>>>>>> 346 // Wait until the next event or the remaining timeout
>>>>>> 347 {
>>>>>> 348 MutexLockerEx x(CGC_lock,
>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>> 349 if (_should_terminate || _collector->_full_gc_requested) {
>>>>>> 350 return;
>>>>>> 351 }
>>>>>> 352 set_CMS_flag(CMS_cms_wants_token); // to provoke
>>>>>> notifies
>>>>>>
>>>>>> insert: assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
>>>>>>
>>>>>> 353 CGC_lock->wait(Mutex::_no_safepoint_check_flag,
>>>>>> wait_time_millis);
>>>>>> 354 clear_CMS_flag(CMS_cms_wants_token);
>>>>>> 355 assert(!CMS_flag_is_set(CMS_cms_has_token |
>>>>>> CMS_cms_wants_token),
>>>>>> 356 "Should not be set");
>>>>>> 357 }
>>>>>> 358
>>>>>> 359 // Extra wait time check before entering the heap lock to
>>>>>> get
>>>>>> the collection count
>>>>>> 360 if (os::elapsedTime() >= end_time) {
>>>>>> 361 // Wait time is over
>>>>>> 362 break;
>>>>>> 363 }
>>>>>>
>>>>>> Modify above wait time check to make an exception for t_miliis == 0:
>>>>>> // Extra wait time check before checking collection count
>>>>>> if (t_millis != 0 && os::elapsedTime() >= end_time) {
>>>>>> // wait time exceeded
>>>>>> break;
>>>>>> }
>>>>>>
>>>>>> 364
>>>>>> 365 // Total collections count after the event
>>>>>> 366 unsigned int after_count;
>>>>>> 367 {
>>>>>> 368 MutexLockerEx hl(Heap_lock,
>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>> 369 after_count = gch->total_collections();
>>>>>> 370 }
>>>>>> 371
>>>>>> 372 if (before_count != after_count) {
>>>>>> 373 // There was a collection - success
>>>>>> 374 break;
>>>>>> 375 }
>>>>>> 376 }
>>>>>> 377 }
>>>>>>
>>>>>> While it is true that we do not have a case where the method is called
>>>>>> with a time of "0", I think we
>>>>>> want that value to be treated correctly as "infinity". For the case
>>>>>> where we do not want a wait at all,
>>>>>> we should use a small positive value, like "1 ms" to signal that
>>>>>> intent, i.e. -XX:CMSWaitDuration=1,
>>>>>> reserving CMSWaitDuration=0 to signal infinity. (We could also do that
>>>>>> by reserving negative values to
>>>>>> signal infinity, but that would make the code in the loop a bit
>>>>>> more fiddly.)
>>>>>>
>>>>>> As mentioned in my previous email, I'd like to see this tested with
>>>>>> CMSWaitDuration set to 0, positive and
>>>>>> negative values (if necessary, we can reject negative value settings),
>>>>>> and with ExplicitGCInvokesConcurrent.
>>>>>>
>>>>>> Rest looks OK to me, although I am not sure how this behaves with
>>>>>> iCMS, as I have forgotten that part of the
>>>>>> code.
>>>>>>
>>>>>> Finally, in current code (before these changes) there are two callers
>>>>>> of the former wait_for_cms_lock() method,
>>>>>> one here in sleepBeforeNextCycle() and one from the precleaning loop.
>>>>>> I think the right thing has been done
>>>>>> in terms of leaving the latter alone.
>>>>>>
>>>>>> It would be good if this were checked with CMSInitiatingOccupancy set
>>>>>> to 0 (or a small value), CMSWaitDuration set to 0,
>>>>>> -+PromotionFailureALot and checking that (1) it does not deadlock (2)
>>>>>> CMS cycles start very soon after the end of
>>>>>> a scavenge (and not at random times as Michal has observed earlier,
>>>>>> although i am guessing that is difficult to test).
>>>>>> It would be good to repeat the above test with iCMS as well.
>>>>>>
>>>>>> thanks!
>>>>>> -- ramki
>>>>>>
>>>>>> On Thu, Dec 6, 2012 at 1:39 PM, Srinivas Ramakrishna wrote:
>>>>>>> Thanks Jon for the pointer:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 6, 2012 at 1:06 PM, Jon Masamitsu wrote:
>>>>>>>>
>>>>>>>> On 12/05/12 14:47, Srinivas Ramakrishna wrote:
>>>>>>>>> The high level idea looks correct. I'll look at the details in a
>>>>>>>>> bit (seriously this time; sorry it dropped off my plate last
>>>>>>>>> time I promised).
>>>>>>>>> Does anyone have a pointer to the related discussion thread on
>>>>>>>>> this aias from earlier in the year, by chance, so one could
>>>>>>>>> refresh one's
>>>>>>>>> memory of that discussion?
>>>>>>>> subj: CMSWaitDuration unstable behavior
>>>>>>>>
>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/thread.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> also:
>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/004880.html
>>>>>>>
>>>>>>> On to it later this afternoon, and TTYL w/review.
>>>>>>> - ramki




More information about the hotspot-gc-dev mailing list