RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS

Michal Frajt michal at frajt.eu
Wed Jan 30 11:35:47 UTC 2013


Hi John,

We have verified the new webrev. The changes have been applied correctly.

Regards,
Michal


Od: "John Cuthbertson" john.cuthbertson at oracle.com
Komu: "Michal Frajt" michal at frajt.eu
Kopie: hotspot-gc-dev at openjdk.java.net
Datum: Fri, 25 Jan 2013 13:13:27 -0800
Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS

> Hi Michal,
> 
> The patch is applied. The new webrev can be found at 
> http://cr.openjdk.java.net/~johnc/7189971/webrev.2/
> 
> Thanks,
> 
> JohnC
> 
> On 1/17/2013 2:58 AM, Michal Frajt wrote:
> > Hi John,
> >
> > Please apply the attached patch to the webrev. You are right, the setting of the CMS token has been somehow moved back above the method return. Additionally I have fixed the printf of the unsigned loop counter (correct is %u).
> >
> > Regards,
> > Michal
> >   
> > Od: hotspot-gc-dev-bounces at openjdk.java.net
> > Komu: hotspot-gc-dev at openjdk.java.net
> > Kopie:
> > Datum: Thu, 10 Jan 2013 12:20:28 -0800
> > Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS
> >
> >
> >> Hi Michal,
> >>
> >> On 1/10/2013 11:46 AM, John Cuthbertson wrote:
> >>> Hi Michal,
> >>>
> >>> Many apologies for the delay in generating a new webrev for this
> >>> change but here is the new one:
> >>> http://cr.openjdk.java.net/~johnc/7189971/webrev.1/
> >>>
> >>> Can you verify the webrev to make sure that changes have been applied
> >>> correctly? Looking at the new webrev it seems that the setting of the
> >>> CMS has been moved back above the return out of the loop. Was this
> >>> intentional?
> >> The above should be "... setting of the CMS token has been ...".
> >>
> >> JohnC
> >>
> >>> I've done a couple of sanity tests with GCOld with CMSWaitDuration=0
> >>> and CMSWaitDuration=1500 with CMS.
> >>>
> >>> Regards,
> >>>
> >>> JohnC
> >>>
> >>> On 12/12/2012 4:35 AM, Michal Frajt wrote:
> >>>> All,
> >>>> Find the attached patch. It implements proposed recommendations and
> >>>> requested changes. Please mind that the CMSWaitDuration set to -1
> >>>> (never wait) requires new parameter CMSCheckInterval (develop only,
> >>>> 1000 milliseconds default - constant). The parameter defines the
> >>>> next CMS cycle start check interval in the case there are no
> >>>> desynchronization (notifications) events on the CGC_lock.
> >>>>
> >>>> Tested with the Solaris/amd64 build
> >>>> CMS
> >>>> + CMSWaitDuration>0 OK
> >>>> + CMSWaitDuration=0 OK
> >>>> + CMSWaitDuration<0 OK
> >>>> iCMS
> >>>> + CMSWaitDuration>0 OK
> >>>> + CMSWaitDuration=0 OK
> >>>> + CMSWaitDuration<0 OK
> >>>> Regards,
> >>>> Michal
> >>>> Od: hotspot-gc-dev-bounces at openjdk.java.net
> >>>> Komu: hotspot-gc-dev at openjdk.java.net
> >>>> Kopie:
> >>>> Datum: Fri, 7 Dec 2012 18:48:48 +0100
> >>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for
> >>>> non-incremental mode of CMS
> >>>>
> >>>>> Hi John/Jon/Ramki,
> >>>>>
> >>>>> All proposed recommendations and requested changes have been
> >>>>> implemented. We are going to test it on Monday. You will get the new
> >>>>> tested patch soon.
> >>>>>
> >>>>> The attached code here just got compiled, no test executed yet, it
> >>>>> might contain a bug, but you can quickly review it and send your
> >>>>> comments.
> >>>>>
> >>>>> Best regards
> >>>>> Michal
> >>>>>
> >>>>>
> >>>>> // Wait until the next synchronous GC, a concurrent full gc request,
> >>>>> // or a timeout, whichever is earlier.
> >>>>> void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
> >>>>> t_millis) {
> >>>>> // Wait time in millis or 0 value representing infinite wait for
> >>>>> a scavenge
> >>>>> assert(t_millis >= 0, "Wait time for scavenge should be 0 or
> >>>>> positive");
> >>>>>
> >>>>> GenCollectedHeap* gch = GenCollectedHeap::heap();
> >>>>> double start_time_secs = os::elapsedTime();
> >>>>> double end_time_secs = start_time_secs + (t_millis / ((double)
> >>>>> MILLIUNITS));
> >>>>>
> >>>>> // Total collections count before waiting loop
> >>>>> unsigned int before_count;
> >>>>> {
> >>>>> MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
> >>>>> before_count = gch->total_collections();
> >>>>> }
> >>>>>
> >>>>> unsigned int loop_count = 0;
> >>>>>
> >>>>> while(!_should_terminate) {
> >>>>> double now_time = os::elapsedTime();
> >>>>> long wait_time_millis;
> >>>>>
> >>>>> if(t_millis != 0) {
> >>>>> // New wait limit
> >>>>> wait_time_millis = (long) ((end_time_secs - now_time) *
> >>>>> MILLIUNITS);
> >>>>> if(wait_time_millis <= 0) {
> >>>>> // Wait time is over
> >>>>> break;
> >>>>> }
> >>>>> } else {
> >>>>> // No wait limit, wait if necessary forever
> >>>>> wait_time_millis = 0;
> >>>>> }
> >>>>>
> >>>>> // Wait until the next event or the remaining timeout
> >>>>> {
> >>>>> MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
> >>>>>
> >>>>> set_CMS_flag(CMS_cms_wants_token); // to provoke notifies
> >>>>> if (_should_terminate || _collector->_full_gc_requested) {
> >>>>> return;
> >>>>> }
> >>>>> assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
> >>>>> CGC_lock->wait(Mutex::_no_safepoint_check_flag,
> >>>>> wait_time_millis);
> >>>>> clear_CMS_flag(CMS_cms_wants_token);
> >>>>> assert(!CMS_flag_is_set(CMS_cms_has_token |
> >>>>> CMS_cms_wants_token),
> >>>>> "Should not be set");
> >>>>> }
> >>>>>
> >>>>> // Extra wait time check before entering the heap lock to get
> >>>>> the collection count
> >>>>> if(t_millis != 0 && os::elapsedTime() >= end_time_secs) {
> >>>>> // Wait time is over
> >>>>> break;
> >>>>> }
> >>>>>
> >>>>> // Total collections count after the event
> >>>>> unsigned int after_count;
> >>>>> {
> >>>>> MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
> >>>>> after_count = gch->total_collections();
> >>>>> }
> >>>>>
> >>>>> if(before_count != after_count) {
> >>>>> // There was a collection - success
> >>>>> break;
> >>>>> }
> >>>>>
> >>>>> // Too many loops warning
> >>>>> if(++loop_count == 0) {
> >>>>> warning("wait_on_cms_lock_for_scavenge() has looped %d
> >>>>> times", loop_count - 1);
> >>>>> }
> >>>>> }
> >>>>> }
> >>>>>
> >>>>> void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
> >>>>> while (!_should_terminate) {
> >>>>> if (CMSIncrementalMode) {
> >>>>> icms_wait();
> >>>>> if(CMSWaitDuration >= 0) {
> >>>>> // Wait until the next synchronous GC, a concurrent full gc
> >>>>> // request or a timeout, whichever is earlier.
> >>>>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
> >>>>> }
> >>>>> return;
> >>>>> } else {
> >>>>> if(CMSWaitDuration >= 0) {
> >>>>> // Wait until the next synchronous GC, a concurrent full gc
> >>>>> // request or a timeout, whichever is earlier.
> >>>>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
> >>>>> } else {
> >>>>> // Wait until any cms_lock event not to call
> >>>>> shouldConcurrentCollect permanently
> >>>>> wait_on_cms_lock(0);
> >>>>> }
> >>>>> }
> >>>>> // Check if we should start a CMS collection cycle
> >>>>> if (_collector->shouldConcurrentCollect()) {
> >>>>> return;
> >>>>> }
> >>>>> // .. collection criterion not yet met, let's go back
> >>>>> // and wait some more
> >>>>> }
> >>>>> }
> >>>>>
> >>>>> Od: hotspot-gc-dev-bounces at openjdk.java.net
> >>>>> Komu: "Jon Masamitsu" jon.masamitsu at oracle.com,"John Cuthbertson"
> >>>>> john.cuthbertson at oracle.com
> >>>>> Kopie: hotspot-gc-dev at openjdk.java.net
> >>>>> Datum: Thu, 6 Dec 2012 23:43:29 -0800
> >>>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for
> >>>>> non-incremental mode of CMS
> >>>>>
> >>>>>> Hi John --
> >>>>>>
> >>>>>> wrt the changes posted, i see the intent of the code and agree with
> >>>>>> it. I have a few minor suggestions on the
> >>>>>> details of how it's implemented. My comments are inline below,
> >>>>>> interleaved with the code:
> >>>>>>
> >>>>>> 317 // Wait until the next synchronous GC, a concurrent full gc
> >>>>>> request,
> >>>>>> 318 // or a timeout, whichever is earlier.
> >>>>>> 319 void
> >>>>>> ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
> >>>>>> t_millis) {
> >>>>>> 320 // Wait for any cms_lock event when timeout not specified
> >>>>>> (0 millis)
> >>>>>> 321 if (t_millis == 0) {
> >>>>>> 322 wait_on_cms_lock(t_millis);
> >>>>>> 323 return;
> >>>>>> 324 }
> >>>>>>
> >>>>>> I'd completely avoid the special case above because it would miss the
> >>>>>> part about waiting for a
> >>>>>> scavenge, instead dealing with that case in the code in the loop below
> >>>>>> directly. The idea
> >>>>>> of the "0" value is not to ask that we return immediately, but that we
> >>>>>> wait, if necessary
> >>>>>> forever, for a scavenge. The "0" really represents the value infinity
> >>>>>> in that sense. This would
> >>>>>> be in keeping with our use of wait() with a "0" value for timeout at
> >>>>>> other places in the JVM as
> >>>>>> well, so it's consistent.
> >>>>>>
> >>>>>> 325
> >>>>>> 326 GenCollectedHeap* gch = GenCollectedHeap::heap();
> >>>>>> 327 double start_time = os::elapsedTime();
> >>>>>> 328 double end_time = start_time + (t_millis / 1000.0);
> >>>>>>
> >>>>>> Note how, the end_time == start_time for the special case of t_millis
> >>>>>> == 0, so we need to treat that
> >>>>>> case specially below.
> >>>>>>
> >>>>>> 329
> >>>>>> 330 // Total collections count before waiting loop
> >>>>>> 331 unsigned int before_count;
> >>>>>> 332 {
> >>>>>> 333 MutexLockerEx hl(Heap_lock,
> >>>>>> Mutex::_no_safepoint_check_flag);
> >>>>>> 334 before_count = gch->total_collections();
> >>>>>> 335 }
> >>>>>>
> >>>>>> Good.
> >>>>>>
> >>>>>> 336
> >>>>>> 337 while (true) {
> >>>>>> 338 double now_time = os::elapsedTime();
> >>>>>> 339 long wait_time_millis = (long)((end_time - now_time) *
> >>>>>> 1000.0);
> >>>>>> 340
> >>>>>> 341 if (wait_time_millis <= 0) {
> >>>>>> 342 // Wait time is over
> >>>>>> 343 break;
> >>>>>> 344 }
> >>>>>>
> >>>>>> Modify to:
> >>>>>> if (t_millis != 0) {
> >>>>>> if (wait_time_millis <= 0) {
> >>>>>> // Wait time is over
> >>>>>> break;
> >>>>>> }
> >>>>>> } else {
> >>>>>> wait_time_millis = 0; // for use in wait() below
> >>>>>> }
> >>>>>>
> >>>>>> 345
> >>>>>> 346 // Wait until the next event or the remaining timeout
> >>>>>> 347 {
> >>>>>> 348 MutexLockerEx x(CGC_lock,
> >>>>>> Mutex::_no_safepoint_check_flag);
> >>>>>> 349 if (_should_terminate || _collector->_full_gc_requested) {
> >>>>>> 350 return;
> >>>>>> 351 }
> >>>>>> 352 set_CMS_flag(CMS_cms_wants_token); // to provoke
> >>>>>> notifies
> >>>>>>
> >>>>>> insert: assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
> >>>>>>
> >>>>>> 353 CGC_lock->wait(Mutex::_no_safepoint_check_flag,
> >>>>>> wait_time_millis);
> >>>>>> 354 clear_CMS_flag(CMS_cms_wants_token);
> >>>>>> 355 assert(!CMS_flag_is_set(CMS_cms_has_token |
> >>>>>> CMS_cms_wants_token),
> >>>>>> 356 "Should not be set");
> >>>>>> 357 }
> >>>>>> 358
> >>>>>> 359 // Extra wait time check before entering the heap lock to
> >>>>>> get
> >>>>>> the collection count
> >>>>>> 360 if (os::elapsedTime() >= end_time) {
> >>>>>> 361 // Wait time is over
> >>>>>> 362 break;
> >>>>>> 363 }
> >>>>>>
> >>>>>> Modify above wait time check to make an exception for t_miliis == 0:
> >>>>>> // Extra wait time check before checking collection count
> >>>>>> if (t_millis != 0 && os::elapsedTime() >= end_time) {
> >>>>>> // wait time exceeded
> >>>>>> break;
> >>>>>> }
> >>>>>>
> >>>>>> 364
> >>>>>> 365 // Total collections count after the event
> >>>>>> 366 unsigned int after_count;
> >>>>>> 367 {
> >>>>>> 368 MutexLockerEx hl(Heap_lock,
> >>>>>> Mutex::_no_safepoint_check_flag);
> >>>>>> 369 after_count = gch->total_collections();
> >>>>>> 370 }
> >>>>>> 371
> >>>>>> 372 if (before_count != after_count) {
> >>>>>> 373 // There was a collection - success
> >>>>>> 374 break;
> >>>>>> 375 }
> >>>>>> 376 }
> >>>>>> 377 }
> >>>>>>
> >>>>>> While it is true that we do not have a case where the method is called
> >>>>>> with a time of "0", I think we
> >>>>>> want that value to be treated correctly as "infinity". For the case
> >>>>>> where we do not want a wait at all,
> >>>>>> we should use a small positive value, like "1 ms" to signal that
> >>>>>> intent, i.e. -XX:CMSWaitDuration=1,
> >>>>>> reserving CMSWaitDuration=0 to signal infinity. (We could also do that
> >>>>>> by reserving negative values to
> >>>>>> signal infinity, but that would make the code in the loop a bit
> >>>>>> more fiddly.)
> >>>>>>
> >>>>>> As mentioned in my previous email, I'd like to see this tested with
> >>>>>> CMSWaitDuration set to 0, positive and
> >>>>>> negative values (if necessary, we can reject negative value settings),
> >>>>>> and with ExplicitGCInvokesConcurrent.
> >>>>>>
> >>>>>> Rest looks OK to me, although I am not sure how this behaves with
> >>>>>> iCMS, as I have forgotten that part of the
> >>>>>> code.
> >>>>>>
> >>>>>> Finally, in current code (before these changes) there are two callers
> >>>>>> of the former wait_for_cms_lock() method,
> >>>>>> one here in sleepBeforeNextCycle() and one from the precleaning loop.
> >>>>>> I think the right thing has been done
> >>>>>> in terms of leaving the latter alone.
> >>>>>>
> >>>>>> It would be good if this were checked with CMSInitiatingOccupancy set
> >>>>>> to 0 (or a small value), CMSWaitDuration set to 0,
> >>>>>> -+PromotionFailureALot and checking that (1) it does not deadlock (2)
> >>>>>> CMS cycles start very soon after the end of
> >>>>>> a scavenge (and not at random times as Michal has observed earlier,
> >>>>>> although i am guessing that is difficult to test).
> >>>>>> It would be good to repeat the above test with iCMS as well.
> >>>>>>
> >>>>>> thanks!
> >>>>>> -- ramki
> >>>>>>
> >>>>>> On Thu, Dec 6, 2012 at 1:39 PM, Srinivas Ramakrishna wrote:
> >>>>>>> Thanks Jon for the pointer:
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Dec 6, 2012 at 1:06 PM, Jon Masamitsu wrote:
> >>>>>>>>
> >>>>>>>> On 12/05/12 14:47, Srinivas Ramakrishna wrote:
> >>>>>>>>> The high level idea looks correct. I'll look at the details in a
> >>>>>>>>> bit (seriously this time; sorry it dropped off my plate last
> >>>>>>>>> time I promised).
> >>>>>>>>> Does anyone have a pointer to the related discussion thread on
> >>>>>>>>> this aias from earlier in the year, by chance, so one could
> >>>>>>>>> refresh one's
> >>>>>>>>> memory of that discussion?
> >>>>>>>> subj: CMSWaitDuration unstable behavior
> >>>>>>>>
> >>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/thread.html
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> also:
> >>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/004880.html
> >>>>>>>
> >>>>>>> On to it later this afternoon, and TTYL w/review.
> >>>>>>> - ramki
> 




More information about the hotspot-gc-dev mailing list