RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS
Jon Masamitsu
jon.masamitsu at oracle.com
Sat Feb 2 01:05:03 UTC 2013
The changes look good. I've patched this into a
repository and will start some testing and
performance runs.
On 1/30/2013 3:35 AM, Michal Frajt wrote:
> Hi John,
>
> We have verified the new webrev. The changes have been applied correctly.
>
> Regards,
> Michal
>
>
> Od: "John Cuthbertson" john.cuthbertson at oracle.com
> Komu: "Michal Frajt" michal at frajt.eu
> Kopie: hotspot-gc-dev at openjdk.java.net
> Datum: Fri, 25 Jan 2013 13:13:27 -0800
> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS
>
>> Hi Michal,
>>
>> The patch is applied. The new webrev can be found at
>> http://cr.openjdk.java.net/~johnc/7189971/webrev.2/
>>
>> Thanks,
>>
>> JohnC
>>
>> On 1/17/2013 2:58 AM, Michal Frajt wrote:
>>> Hi John,
>>>
>>> Please apply the attached patch to the webrev. You are right, the setting of the CMS token has been somehow moved back above the method return. Additionally I have fixed the printf of the unsigned loop counter (correct is %u).
>>>
>>> Regards,
>>> Michal
>>>
>>> Od: hotspot-gc-dev-bounces at openjdk.java.net
>>> Komu: hotspot-gc-dev at openjdk.java.net
>>> Kopie:
>>> Datum: Thu, 10 Jan 2013 12:20:28 -0800
>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS
>>>
>>>
>>>> Hi Michal,
>>>>
>>>> On 1/10/2013 11:46 AM, John Cuthbertson wrote:
>>>>> Hi Michal,
>>>>>
>>>>> Many apologies for the delay in generating a new webrev for this
>>>>> change but here is the new one:
>>>>> http://cr.openjdk.java.net/~johnc/7189971/webrev.1/
>>>>>
>>>>> Can you verify the webrev to make sure that changes have been applied
>>>>> correctly? Looking at the new webrev it seems that the setting of the
>>>>> CMS has been moved back above the return out of the loop. Was this
>>>>> intentional?
>>>> The above should be "... setting of the CMS token has been ...".
>>>>
>>>> JohnC
>>>>
>>>>> I've done a couple of sanity tests with GCOld with CMSWaitDuration=0
>>>>> and CMSWaitDuration=1500 with CMS.
>>>>>
>>>>> Regards,
>>>>>
>>>>> JohnC
>>>>>
>>>>> On 12/12/2012 4:35 AM, Michal Frajt wrote:
>>>>>> All,
>>>>>> Find the attached patch. It implements proposed recommendations and
>>>>>> requested changes. Please mind that the CMSWaitDuration set to -1
>>>>>> (never wait) requires new parameter CMSCheckInterval (develop only,
>>>>>> 1000 milliseconds default - constant). The parameter defines the
>>>>>> next CMS cycle start check interval in the case there are no
>>>>>> desynchronization (notifications) events on the CGC_lock.
>>>>>>
>>>>>> Tested with the Solaris/amd64 build
>>>>>> CMS
>>>>>> + CMSWaitDuration>0 OK
>>>>>> + CMSWaitDuration=0 OK
>>>>>> + CMSWaitDuration<0 OK
>>>>>> iCMS
>>>>>> + CMSWaitDuration>0 OK
>>>>>> + CMSWaitDuration=0 OK
>>>>>> + CMSWaitDuration<0 OK
>>>>>> Regards,
>>>>>> Michal
>>>>>> Od: hotspot-gc-dev-bounces at openjdk.java.net
>>>>>> Komu: hotspot-gc-dev at openjdk.java.net
>>>>>> Kopie:
>>>>>> Datum: Fri, 7 Dec 2012 18:48:48 +0100
>>>>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for
>>>>>> non-incremental mode of CMS
>>>>>>
>>>>>>> Hi John/Jon/Ramki,
>>>>>>>
>>>>>>> All proposed recommendations and requested changes have been
>>>>>>> implemented. We are going to test it on Monday. You will get the new
>>>>>>> tested patch soon.
>>>>>>>
>>>>>>> The attached code here just got compiled, no test executed yet, it
>>>>>>> might contain a bug, but you can quickly review it and send your
>>>>>>> comments.
>>>>>>>
>>>>>>> Best regards
>>>>>>> Michal
>>>>>>>
>>>>>>>
>>>>>>> // Wait until the next synchronous GC, a concurrent full gc request,
>>>>>>> // or a timeout, whichever is earlier.
>>>>>>> void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
>>>>>>> t_millis) {
>>>>>>> // Wait time in millis or 0 value representing infinite wait for
>>>>>>> a scavenge
>>>>>>> assert(t_millis>= 0, "Wait time for scavenge should be 0 or
>>>>>>> positive");
>>>>>>>
>>>>>>> GenCollectedHeap* gch = GenCollectedHeap::heap();
>>>>>>> double start_time_secs = os::elapsedTime();
>>>>>>> double end_time_secs = start_time_secs + (t_millis / ((double)
>>>>>>> MILLIUNITS));
>>>>>>>
>>>>>>> // Total collections count before waiting loop
>>>>>>> unsigned int before_count;
>>>>>>> {
>>>>>>> MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>>>>>> before_count = gch->total_collections();
>>>>>>> }
>>>>>>>
>>>>>>> unsigned int loop_count = 0;
>>>>>>>
>>>>>>> while(!_should_terminate) {
>>>>>>> double now_time = os::elapsedTime();
>>>>>>> long wait_time_millis;
>>>>>>>
>>>>>>> if(t_millis != 0) {
>>>>>>> // New wait limit
>>>>>>> wait_time_millis = (long) ((end_time_secs - now_time) *
>>>>>>> MILLIUNITS);
>>>>>>> if(wait_time_millis<= 0) {
>>>>>>> // Wait time is over
>>>>>>> break;
>>>>>>> }
>>>>>>> } else {
>>>>>>> // No wait limit, wait if necessary forever
>>>>>>> wait_time_millis = 0;
>>>>>>> }
>>>>>>>
>>>>>>> // Wait until the next event or the remaining timeout
>>>>>>> {
>>>>>>> MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
>>>>>>>
>>>>>>> set_CMS_flag(CMS_cms_wants_token); // to provoke notifies
>>>>>>> if (_should_terminate || _collector->_full_gc_requested) {
>>>>>>> return;
>>>>>>> }
>>>>>>> assert(t_millis == 0 || wait_time_millis> 0, "Sanity");
>>>>>>> CGC_lock->wait(Mutex::_no_safepoint_check_flag,
>>>>>>> wait_time_millis);
>>>>>>> clear_CMS_flag(CMS_cms_wants_token);
>>>>>>> assert(!CMS_flag_is_set(CMS_cms_has_token |
>>>>>>> CMS_cms_wants_token),
>>>>>>> "Should not be set");
>>>>>>> }
>>>>>>>
>>>>>>> // Extra wait time check before entering the heap lock to get
>>>>>>> the collection count
>>>>>>> if(t_millis != 0&& os::elapsedTime()>= end_time_secs) {
>>>>>>> // Wait time is over
>>>>>>> break;
>>>>>>> }
>>>>>>>
>>>>>>> // Total collections count after the event
>>>>>>> unsigned int after_count;
>>>>>>> {
>>>>>>> MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>>>>>> after_count = gch->total_collections();
>>>>>>> }
>>>>>>>
>>>>>>> if(before_count != after_count) {
>>>>>>> // There was a collection - success
>>>>>>> break;
>>>>>>> }
>>>>>>>
>>>>>>> // Too many loops warning
>>>>>>> if(++loop_count == 0) {
>>>>>>> warning("wait_on_cms_lock_for_scavenge() has looped %d
>>>>>>> times", loop_count - 1);
>>>>>>> }
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
>>>>>>> while (!_should_terminate) {
>>>>>>> if (CMSIncrementalMode) {
>>>>>>> icms_wait();
>>>>>>> if(CMSWaitDuration>= 0) {
>>>>>>> // Wait until the next synchronous GC, a concurrent full gc
>>>>>>> // request or a timeout, whichever is earlier.
>>>>>>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>>>>>> }
>>>>>>> return;
>>>>>>> } else {
>>>>>>> if(CMSWaitDuration>= 0) {
>>>>>>> // Wait until the next synchronous GC, a concurrent full gc
>>>>>>> // request or a timeout, whichever is earlier.
>>>>>>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>>>>>> } else {
>>>>>>> // Wait until any cms_lock event not to call
>>>>>>> shouldConcurrentCollect permanently
>>>>>>> wait_on_cms_lock(0);
>>>>>>> }
>>>>>>> }
>>>>>>> // Check if we should start a CMS collection cycle
>>>>>>> if (_collector->shouldConcurrentCollect()) {
>>>>>>> return;
>>>>>>> }
>>>>>>> // .. collection criterion not yet met, let's go back
>>>>>>> // and wait some more
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> Od: hotspot-gc-dev-bounces at openjdk.java.net
>>>>>>> Komu: "Jon Masamitsu" jon.masamitsu at oracle.com,"John Cuthbertson"
>>>>>>> john.cuthbertson at oracle.com
>>>>>>> Kopie: hotspot-gc-dev at openjdk.java.net
>>>>>>> Datum: Thu, 6 Dec 2012 23:43:29 -0800
>>>>>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for
>>>>>>> non-incremental mode of CMS
>>>>>>>
>>>>>>>> Hi John --
>>>>>>>>
>>>>>>>> wrt the changes posted, i see the intent of the code and agree with
>>>>>>>> it. I have a few minor suggestions on the
>>>>>>>> details of how it's implemented. My comments are inline below,
>>>>>>>> interleaved with the code:
>>>>>>>>
>>>>>>>> 317 // Wait until the next synchronous GC, a concurrent full gc
>>>>>>>> request,
>>>>>>>> 318 // or a timeout, whichever is earlier.
>>>>>>>> 319 void
>>>>>>>> ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
>>>>>>>> t_millis) {
>>>>>>>> 320 // Wait for any cms_lock event when timeout not specified
>>>>>>>> (0 millis)
>>>>>>>> 321 if (t_millis == 0) {
>>>>>>>> 322 wait_on_cms_lock(t_millis);
>>>>>>>> 323 return;
>>>>>>>> 324 }
>>>>>>>>
>>>>>>>> I'd completely avoid the special case above because it would miss the
>>>>>>>> part about waiting for a
>>>>>>>> scavenge, instead dealing with that case in the code in the loop below
>>>>>>>> directly. The idea
>>>>>>>> of the "0" value is not to ask that we return immediately, but that we
>>>>>>>> wait, if necessary
>>>>>>>> forever, for a scavenge. The "0" really represents the value infinity
>>>>>>>> in that sense. This would
>>>>>>>> be in keeping with our use of wait() with a "0" value for timeout at
>>>>>>>> other places in the JVM as
>>>>>>>> well, so it's consistent.
>>>>>>>>
>>>>>>>> 325
>>>>>>>> 326 GenCollectedHeap* gch = GenCollectedHeap::heap();
>>>>>>>> 327 double start_time = os::elapsedTime();
>>>>>>>> 328 double end_time = start_time + (t_millis / 1000.0);
>>>>>>>>
>>>>>>>> Note how, the end_time == start_time for the special case of t_millis
>>>>>>>> == 0, so we need to treat that
>>>>>>>> case specially below.
>>>>>>>>
>>>>>>>> 329
>>>>>>>> 330 // Total collections count before waiting loop
>>>>>>>> 331 unsigned int before_count;
>>>>>>>> 332 {
>>>>>>>> 333 MutexLockerEx hl(Heap_lock,
>>>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>>>> 334 before_count = gch->total_collections();
>>>>>>>> 335 }
>>>>>>>>
>>>>>>>> Good.
>>>>>>>>
>>>>>>>> 336
>>>>>>>> 337 while (true) {
>>>>>>>> 338 double now_time = os::elapsedTime();
>>>>>>>> 339 long wait_time_millis = (long)((end_time - now_time) *
>>>>>>>> 1000.0);
>>>>>>>> 340
>>>>>>>> 341 if (wait_time_millis<= 0) {
>>>>>>>> 342 // Wait time is over
>>>>>>>> 343 break;
>>>>>>>> 344 }
>>>>>>>>
>>>>>>>> Modify to:
>>>>>>>> if (t_millis != 0) {
>>>>>>>> if (wait_time_millis<= 0) {
>>>>>>>> // Wait time is over
>>>>>>>> break;
>>>>>>>> }
>>>>>>>> } else {
>>>>>>>> wait_time_millis = 0; // for use in wait() below
>>>>>>>> }
>>>>>>>>
>>>>>>>> 345
>>>>>>>> 346 // Wait until the next event or the remaining timeout
>>>>>>>> 347 {
>>>>>>>> 348 MutexLockerEx x(CGC_lock,
>>>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>>>> 349 if (_should_terminate || _collector->_full_gc_requested) {
>>>>>>>> 350 return;
>>>>>>>> 351 }
>>>>>>>> 352 set_CMS_flag(CMS_cms_wants_token); // to provoke
>>>>>>>> notifies
>>>>>>>>
>>>>>>>> insert: assert(t_millis == 0 || wait_time_millis> 0, "Sanity");
>>>>>>>>
>>>>>>>> 353 CGC_lock->wait(Mutex::_no_safepoint_check_flag,
>>>>>>>> wait_time_millis);
>>>>>>>> 354 clear_CMS_flag(CMS_cms_wants_token);
>>>>>>>> 355 assert(!CMS_flag_is_set(CMS_cms_has_token |
>>>>>>>> CMS_cms_wants_token),
>>>>>>>> 356 "Should not be set");
>>>>>>>> 357 }
>>>>>>>> 358
>>>>>>>> 359 // Extra wait time check before entering the heap lock to
>>>>>>>> get
>>>>>>>> the collection count
>>>>>>>> 360 if (os::elapsedTime()>= end_time) {
>>>>>>>> 361 // Wait time is over
>>>>>>>> 362 break;
>>>>>>>> 363 }
>>>>>>>>
>>>>>>>> Modify above wait time check to make an exception for t_miliis == 0:
>>>>>>>> // Extra wait time check before checking collection count
>>>>>>>> if (t_millis != 0&& os::elapsedTime()>= end_time) {
>>>>>>>> // wait time exceeded
>>>>>>>> break;
>>>>>>>> }
>>>>>>>>
>>>>>>>> 364
>>>>>>>> 365 // Total collections count after the event
>>>>>>>> 366 unsigned int after_count;
>>>>>>>> 367 {
>>>>>>>> 368 MutexLockerEx hl(Heap_lock,
>>>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>>>> 369 after_count = gch->total_collections();
>>>>>>>> 370 }
>>>>>>>> 371
>>>>>>>> 372 if (before_count != after_count) {
>>>>>>>> 373 // There was a collection - success
>>>>>>>> 374 break;
>>>>>>>> 375 }
>>>>>>>> 376 }
>>>>>>>> 377 }
>>>>>>>>
>>>>>>>> While it is true that we do not have a case where the method is called
>>>>>>>> with a time of "0", I think we
>>>>>>>> want that value to be treated correctly as "infinity". For the case
>>>>>>>> where we do not want a wait at all,
>>>>>>>> we should use a small positive value, like "1 ms" to signal that
>>>>>>>> intent, i.e. -XX:CMSWaitDuration=1,
>>>>>>>> reserving CMSWaitDuration=0 to signal infinity. (We could also do that
>>>>>>>> by reserving negative values to
>>>>>>>> signal infinity, but that would make the code in the loop a bit
>>>>>>>> more fiddly.)
>>>>>>>>
>>>>>>>> As mentioned in my previous email, I'd like to see this tested with
>>>>>>>> CMSWaitDuration set to 0, positive and
>>>>>>>> negative values (if necessary, we can reject negative value settings),
>>>>>>>> and with ExplicitGCInvokesConcurrent.
>>>>>>>>
>>>>>>>> Rest looks OK to me, although I am not sure how this behaves with
>>>>>>>> iCMS, as I have forgotten that part of the
>>>>>>>> code.
>>>>>>>>
>>>>>>>> Finally, in current code (before these changes) there are two callers
>>>>>>>> of the former wait_for_cms_lock() method,
>>>>>>>> one here in sleepBeforeNextCycle() and one from the precleaning loop.
>>>>>>>> I think the right thing has been done
>>>>>>>> in terms of leaving the latter alone.
>>>>>>>>
>>>>>>>> It would be good if this were checked with CMSInitiatingOccupancy set
>>>>>>>> to 0 (or a small value), CMSWaitDuration set to 0,
>>>>>>>> -+PromotionFailureALot and checking that (1) it does not deadlock (2)
>>>>>>>> CMS cycles start very soon after the end of
>>>>>>>> a scavenge (and not at random times as Michal has observed earlier,
>>>>>>>> although i am guessing that is difficult to test).
>>>>>>>> It would be good to repeat the above test with iCMS as well.
>>>>>>>>
>>>>>>>> thanks!
>>>>>>>> -- ramki
>>>>>>>>
>>>>>>>> On Thu, Dec 6, 2012 at 1:39 PM, Srinivas Ramakrishna wrote:
>>>>>>>>> Thanks Jon for the pointer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Dec 6, 2012 at 1:06 PM, Jon Masamitsu wrote:
>>>>>>>>>> On 12/05/12 14:47, Srinivas Ramakrishna wrote:
>>>>>>>>>>> The high level idea looks correct. I'll look at the details in a
>>>>>>>>>>> bit (seriously this time; sorry it dropped off my plate last
>>>>>>>>>>> time I promised).
>>>>>>>>>>> Does anyone have a pointer to the related discussion thread on
>>>>>>>>>>> this aias from earlier in the year, by chance, so one could
>>>>>>>>>>> refresh one's
>>>>>>>>>>> memory of that discussion?
>>>>>>>>>> subj: CMSWaitDuration unstable behavior
>>>>>>>>>>
>>>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/thread.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> also:
>>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/004880.html
>>>>>>>>>
>>>>>>>>> On to it later this afternoon, and TTYL w/review.
>>>>>>>>> - ramki
More information about the hotspot-gc-dev
mailing list