RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS
John Cuthbertson
john.cuthbertson at oracle.com
Thu Dec 13 23:33:39 UTC 2012
Hi Michal,
Thanks for the revised patch. I'll apply it, generate a revised webrev,
and send out a follow-up request.
JohnC
On 12/12/12 04:35, Michal Frajt wrote:
> All,
>
> Find the attached patch. It implements proposed recommendations and requested changes. Please mind that the CMSWaitDuration set to -1 (never wait) requires new parameter CMSCheckInterval (develop only, 1000 milliseconds default - constant). The parameter defines the next CMS cycle start check interval in the case there are no desynchronization (notifications) events on the CGC_lock.
>
> Tested with the Solaris/amd64 build
>
> CMS
> + CMSWaitDuration>0 OK
> + CMSWaitDuration=0 OK
> + CMSWaitDuration<0 OK
>
> iCMS
> + CMSWaitDuration>0 OK
> + CMSWaitDuration=0 OK
> + CMSWaitDuration<0 OK
>
> Regards,
> Michal
>
>
> Od: hotspot-gc-dev-bounces at openjdk.java.net
> Komu: hotspot-gc-dev at openjdk.java.net
> Kopie:
> Datum: Fri, 7 Dec 2012 18:48:48 +0100
> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS
>
>
>> Hi John/Jon/Ramki,
>>
>> All proposed recommendations and requested changes have been implemented. We are going to test it on Monday. You will get the new tested patch soon.
>>
>> The attached code here just got compiled, no test executed yet, it might contain a bug, but you can quickly review it and send your comments.
>>
>> Best regards
>> Michal
>>
>>
>> // Wait until the next synchronous GC, a concurrent full gc request,
>> // or a timeout, whichever is earlier.
>> void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long t_millis) {
>> // Wait time in millis or 0 value representing infinite wait for a scavenge
>> assert(t_millis >= 0, "Wait time for scavenge should be 0 or positive");
>>
>> GenCollectedHeap* gch = GenCollectedHeap::heap();
>> double start_time_secs = os::elapsedTime();
>> double end_time_secs = start_time_secs + (t_millis / ((double) MILLIUNITS));
>>
>> // Total collections count before waiting loop
>> unsigned int before_count;
>> {
>> MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>> before_count = gch->total_collections();
>> }
>>
>> unsigned int loop_count = 0;
>>
>> while(!_should_terminate) {
>> double now_time = os::elapsedTime();
>> long wait_time_millis;
>>
>> if(t_millis != 0) {
>> // New wait limit
>> wait_time_millis = (long) ((end_time_secs - now_time) * MILLIUNITS);
>> if(wait_time_millis <= 0) {
>> // Wait time is over
>> break;
>> }
>> } else {
>> // No wait limit, wait if necessary forever
>> wait_time_millis = 0;
>> }
>>
>> // Wait until the next event or the remaining timeout
>> {
>> MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
>>
>> set_CMS_flag(CMS_cms_wants_token); // to provoke notifies
>> if (_should_terminate || _collector->_full_gc_requested) {
>> return;
>> }
>> assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
>> CGC_lock->wait(Mutex::_no_safepoint_check_flag, wait_time_millis);
>> clear_CMS_flag(CMS_cms_wants_token);
>> assert(!CMS_flag_is_set(CMS_cms_has_token | CMS_cms_wants_token),
>> "Should not be set");
>> }
>>
>> // Extra wait time check before entering the heap lock to get the collection count
>> if(t_millis != 0 && os::elapsedTime() >= end_time_secs) {
>> // Wait time is over
>> break;
>> }
>>
>> // Total collections count after the event
>> unsigned int after_count;
>> {
>> MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>> after_count = gch->total_collections();
>> }
>>
>> if(before_count != after_count) {
>> // There was a collection - success
>> break;
>> }
>>
>> // Too many loops warning
>> if(++loop_count == 0) {
>> warning("wait_on_cms_lock_for_scavenge() has looped %d times", loop_count - 1);
>> }
>> }
>> }
>>
>> void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
>> while (!_should_terminate) {
>> if (CMSIncrementalMode) {
>> icms_wait();
>> if(CMSWaitDuration >= 0) {
>> // Wait until the next synchronous GC, a concurrent full gc
>> // request or a timeout, whichever is earlier.
>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>> }
>> return;
>> } else {
>> if(CMSWaitDuration >= 0) {
>> // Wait until the next synchronous GC, a concurrent full gc
>> // request or a timeout, whichever is earlier.
>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>> } else {
>> // Wait until any cms_lock event not to call shouldConcurrentCollect permanently
>> wait_on_cms_lock(0);
>> }
>> }
>> // Check if we should start a CMS collection cycle
>> if (_collector->shouldConcurrentCollect()) {
>> return;
>> }
>> // .. collection criterion not yet met, let's go back
>> // and wait some more
>> }
>> }
>>
>>
>>
>> Od: hotspot-gc-dev-bounces at openjdk.java.net
>> Komu: "Jon Masamitsu" jon.masamitsu at oracle.com,"John Cuthbertson" john.cuthbertson at oracle.com
>> Kopie: hotspot-gc-dev at openjdk.java.net
>> Datum: Thu, 6 Dec 2012 23:43:29 -0800
>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS
>>
>>
>>> Hi John --
>>>
>>> wrt the changes posted, i see the intent of the code and agree with
>>> it. I have a few minor suggestions on the
>>> details of how it's implemented. My comments are inline below,
>>> interleaved with the code:
>>>
>>> 317 // Wait until the next synchronous GC, a concurrent full gc request,
>>> 318 // or a timeout, whichever is earlier.
>>> 319 void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
>>> t_millis) {
>>> 320 // Wait for any cms_lock event when timeout not specified (0 millis)
>>> 321 if (t_millis == 0) {
>>> 322 wait_on_cms_lock(t_millis);
>>> 323 return;
>>> 324 }
>>>
>>> I'd completely avoid the special case above because it would miss the
>>> part about waiting for a
>>> scavenge, instead dealing with that case in the code in the loop below
>>> directly. The idea
>>> of the "0" value is not to ask that we return immediately, but that we
>>> wait, if necessary
>>> forever, for a scavenge. The "0" really represents the value infinity
>>> in that sense. This would
>>> be in keeping with our use of wait() with a "0" value for timeout at
>>> other places in the JVM as
>>> well, so it's consistent.
>>>
>>> 325
>>> 326 GenCollectedHeap* gch = GenCollectedHeap::heap();
>>> 327 double start_time = os::elapsedTime();
>>> 328 double end_time = start_time + (t_millis / 1000.0);
>>>
>>> Note how, the end_time == start_time for the special case of t_millis
>>> == 0, so we need to treat that
>>> case specially below.
>>>
>>> 329
>>> 330 // Total collections count before waiting loop
>>> 331 unsigned int before_count;
>>> 332 {
>>> 333 MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>> 334 before_count = gch->total_collections();
>>> 335 }
>>>
>>> Good.
>>>
>>> 336
>>> 337 while (true) {
>>> 338 double now_time = os::elapsedTime();
>>> 339 long wait_time_millis = (long)((end_time - now_time) * 1000.0);
>>> 340
>>> 341 if (wait_time_millis <= 0) {
>>> 342 // Wait time is over
>>> 343 break;
>>> 344 }
>>>
>>> Modify to:
>>> if (t_millis != 0) {
>>> if (wait_time_millis <= 0) {
>>> // Wait time is over
>>> break;
>>> }
>>> } else {
>>> wait_time_millis = 0; // for use in wait() below
>>> }
>>>
>>> 345
>>> 346 // Wait until the next event or the remaining timeout
>>> 347 {
>>> 348 MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
>>> 349 if (_should_terminate || _collector->_full_gc_requested) {
>>> 350 return;
>>> 351 }
>>> 352 set_CMS_flag(CMS_cms_wants_token); // to provoke notifies
>>>
>>> insert: assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
>>>
>>> 353 CGC_lock->wait(Mutex::_no_safepoint_check_flag, wait_time_millis);
>>> 354 clear_CMS_flag(CMS_cms_wants_token);
>>> 355 assert(!CMS_flag_is_set(CMS_cms_has_token | CMS_cms_wants_token),
>>> 356 "Should not be set");
>>> 357 }
>>> 358
>>> 359 // Extra wait time check before entering the heap lock to get
>>> the collection count
>>> 360 if (os::elapsedTime() >= end_time) {
>>> 361 // Wait time is over
>>> 362 break;
>>> 363 }
>>>
>>> Modify above wait time check to make an exception for t_miliis == 0:
>>> // Extra wait time check before checking collection count
>>> if (t_millis != 0 && os::elapsedTime() >= end_time) {
>>> // wait time exceeded
>>> break;
>>> }
>>>
>>> 364
>>> 365 // Total collections count after the event
>>> 366 unsigned int after_count;
>>> 367 {
>>> 368 MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
>>> 369 after_count = gch->total_collections();
>>> 370 }
>>> 371
>>> 372 if (before_count != after_count) {
>>> 373 // There was a collection - success
>>> 374 break;
>>> 375 }
>>> 376 }
>>> 377 }
>>>
>>> While it is true that we do not have a case where the method is called
>>> with a time of "0", I think we
>>> want that value to be treated correctly as "infinity". For the case
>>> where we do not want a wait at all,
>>> we should use a small positive value, like "1 ms" to signal that
>>> intent, i.e. -XX:CMSWaitDuration=1,
>>> reserving CMSWaitDuration=0 to signal infinity. (We could also do that
>>> by reserving negative values to
>>> signal infinity, but that would make the code in the loop a bit more fiddly.)
>>>
>>> As mentioned in my previous email, I'd like to see this tested with
>>> CMSWaitDuration set to 0, positive and
>>> negative values (if necessary, we can reject negative value settings),
>>> and with ExplicitGCInvokesConcurrent.
>>>
>>> Rest looks OK to me, although I am not sure how this behaves with
>>> iCMS, as I have forgotten that part of the
>>> code.
>>>
>>> Finally, in current code (before these changes) there are two callers
>>> of the former wait_for_cms_lock() method,
>>> one here in sleepBeforeNextCycle() and one from the precleaning loop.
>>> I think the right thing has been done
>>> in terms of leaving the latter alone.
>>>
>>> It would be good if this were checked with CMSInitiatingOccupancy set
>>> to 0 (or a small value), CMSWaitDuration set to 0,
>>> -+PromotionFailureALot and checking that (1) it does not deadlock (2)
>>> CMS cycles start very soon after the end of
>>> a scavenge (and not at random times as Michal has observed earlier,
>>> although i am guessing that is difficult to test).
>>> It would be good to repeat the above test with iCMS as well.
>>>
>>> thanks!
>>> -- ramki
>>>
>>> On Thu, Dec 6, 2012 at 1:39 PM, Srinivas Ramakrishna wrote:
>>>
>>>> Thanks Jon for the pointer:
>>>>
>>>>
>>>> On Thu, Dec 6, 2012 at 1:06 PM, Jon Masamitsu wrote:
>>>>
>>>>>
>>>>> On 12/05/12 14:47, Srinivas Ramakrishna wrote:
>>>>>
>>>>>> The high level idea looks correct. I'll look at the details in a bit (seriously this time; sorry it dropped off my plate last time I promised).
>>>>>> Does anyone have a pointer to the related discussion thread on this aias from earlier in the year, by chance, so one could refresh one's
>>>>>> memory of that discussion?
>>>>>>
>>>>> subj: CMSWaitDuration unstable behavior
>>>>>
>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/thread.html
>>>>>
>>>>>
>>>>>
>>>> also: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/004880.html
>>>>
>>>> On to it later this afternoon, and TTYL w/review.
>>>> - ramki
>>>>
More information about the hotspot-gc-dev
mailing list