RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS

Fri Feb 22 23:41:45 UTC 2013

Jon,

Right, the default value(s) will cause the benefits of this change without the user changing any flags. But the default value of 2000ms is usually very low to cover all initial-marks. The CMSWaitDuration=-1 and CMSCheckInterval=2000 will get back to the original behavior (not for iCMS as it was not supported at all). Mind that the CMSCheckInterval is currently development flag only.

Michal

-----Original Message-----
From: Jon Masamitsu [mailto:jon.masamitsu at oracle.com] 
Sent: Saturday, February 23, 2013 12:27 AM
To: Michal Frajt
Cc: hotspot-gc-dev at openjdk.java.net
Subject: Re: RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS

On 2/22/2013 7:58 AM, Michal Frajt wrote:
>   
> Hi Jon,
>
> The CMSWaitDuration set to -1 disables waiting for the scavenge. In order not to invoke the "should run" method permanently there is a new development parameter CMSCheckInterval set to 1000ms. Indeed, the default CMSWaitDuration is 2000ms which might not allow the CMS concurrent cycle to start for 2000ms, the CMSCheckInterval will delay the start for 1000ms only. In a very extreme case there might be twice more CMS concurrent cycles invoked when CMSWaitDuration is set to -1.
>
> The main issue there is that the current CMS code does not handle the 
> CMSWaitDuration set to -1. The behavior is rather unexpected. Ramki 
> requested to handle it in my implementation but there is not much 
> relation to the existing handling. The patch does not require anybody 
> to
You mean that the default values will cause the benefits of this change to realized without the user changing any flags, right?

In order to get back to the original behavior, the user would set CMSWaitDuration = -1 and CMSCheckInterval= 2000

Jon
> change anything. The default CMSWaitDuration 2000ms might provide benefits on the initial-mark if young gen occurs less than every 2000ms. Everybody else might increase the CMSWaitDuration to take the action with every initial-mark. The CMSWaitDuration set to 0 might prolongate the CMS concurrent cycle start as it would wait for a real scavenge and not for any synchronize/desynchronize event like the current implementation does.
>
> CMSWaitDuration > 0 - wait for a scavenge no longer than 
> CMSWaitDuration time CMSWaitDuration = 0 - wait for a scavenge 
> CMSWaitDuration -1 - never wait for a scavenge, try every 
> CMSCheckInterval if CMS concurrent cycle should start
>
> Regards,
> Michal
>   
> Od: hotspot-gc-dev-bounces at openjdk.java.net
> Komu: hotspot-gc-dev at openjdk.java.net
> Kopie:
> Datum: Tue, 12 Feb 2013 09:15:10 -0800
> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for 
> non-incremental mode of CMS
>
>> Michal,
>>
>> I'm running some benchmarks (specjbb2000 and specjbb2005) and I'm 
>> seeing some differences in the number of concurrent cms cycles.  I'm 
>> comparing
>>
>> baseline vs -XX:CMSWaitDuration=-1
>>
>> and
>>
>> default value for CMSWaitDuration vs  -XX:CMSWaitDuration=-1
>>
>> I suspect it's just variations in GC but will need to dig into the GC 
>> logs more to know.
>>
>> In you testing did you notice any statistically significant 
>> differences in the number of CMS concurrent cycles that you were not expecting?
>>
>> Jon
>>
>>
>> On 01/30/13 03:35, Michal Frajt wrote:
>>> Hi John,
>>>
>>> We have verified the new webrev. The changes have been applied correctly.
>>>
>>> Regards,
>>> Michal
>>>
>>>
>>> Od: "John Cuthbertson" john.cuthbertson at oracle.com
>>> Komu: "Michal Frajt" michal at frajt.eu
>>> Kopie: hotspot-gc-dev at openjdk.java.net
>>> Datum: Fri, 25 Jan 2013 13:13:27 -0800
>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for 
>>> non-incremental mode of CMS
>>>
>>>> Hi Michal,
>>>>
>>>> The patch is applied. The new webrev can be found at 
>>>> http://cr.openjdk.java.net/~johnc/7189971/webrev.2/
>>>>
>>>> Thanks,
>>>>
>>>> JohnC
>>>>
>>>> On 1/17/2013 2:58 AM, Michal Frajt wrote:
>>>>> Hi John,
>>>>>
>>>>> Please apply the attached patch to the webrev. You are right, the setting of the CMS token has been somehow moved back above the method return. Additionally I have fixed the printf of the unsigned loop counter (correct is %u).
>>>>>
>>>>> Regards,
>>>>> Michal
>>>>>
>>>>> Od: hotspot-gc-dev-bounces at openjdk.java.net
>>>>> Komu: hotspot-gc-dev at openjdk.java.net
>>>>> Kopie:
>>>>> Datum: Thu, 10 Jan 2013 12:20:28 -0800
>>>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for 
>>>>> non-incremental mode of CMS
>>>>>
>>>>>
>>>>>> Hi Michal,
>>>>>>
>>>>>> On 1/10/2013 11:46 AM, John Cuthbertson wrote:
>>>>>>> Hi Michal,
>>>>>>>
>>>>>>> Many apologies for the delay in generating a new webrev for this 
>>>>>>> change but here is the new one:
>>>>>>> http://cr.openjdk.java.net/~johnc/7189971/webrev.1/
>>>>>>>
>>>>>>> Can you verify the webrev to make sure that changes have been 
>>>>>>> applied correctly? Looking at the new webrev it seems that the 
>>>>>>> setting of the CMS has been moved back above the return out of 
>>>>>>> the loop. Was this intentional?
>>>>>> The above should be "... setting of the CMS token has been ...".
>>>>>>
>>>>>> JohnC
>>>>>>
>>>>>>> I've done a couple of sanity tests with GCOld with 
>>>>>>> CMSWaitDuration=0 and CMSWaitDuration=1500 with CMS.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> JohnC
>>>>>>>
>>>>>>> On 12/12/2012 4:35 AM, Michal Frajt wrote:
>>>>>>>> All,
>>>>>>>> Find the attached patch. It implements proposed recommendations 
>>>>>>>> and requested changes. Please mind that the CMSWaitDuration set 
>>>>>>>> to -1 (never wait) requires new parameter CMSCheckInterval 
>>>>>>>> (develop only,
>>>>>>>> 1000 milliseconds default - constant). The parameter defines 
>>>>>>>> the next CMS cycle start check interval in the case there are 
>>>>>>>> no desynchronization (notifications) events on the CGC_lock.
>>>>>>>>
>>>>>>>> Tested with the Solaris/amd64 build CMS
>>>>>>>> + CMSWaitDuration>0 OK
>>>>>>>> + CMSWaitDuration=0 OK
>>>>>>>> + CMSWaitDuration<0 OK
>>>>>>>> iCMS
>>>>>>>> + CMSWaitDuration>0 OK
>>>>>>>> + CMSWaitDuration=0 OK
>>>>>>>> + CMSWaitDuration<0 OK
>>>>>>>> Regards,
>>>>>>>> Michal
>>>>>>>> Od: hotspot-gc-dev-bounces at openjdk.java.net
>>>>>>>> Komu: hotspot-gc-dev at openjdk.java.net
>>>>>>>> Kopie:
>>>>>>>> Datum: Fri, 7 Dec 2012 18:48:48 +0100
>>>>>>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for 
>>>>>>>> non-incremental mode of CMS
>>>>>>>>
>>>>>>>>> Hi John/Jon/Ramki,
>>>>>>>>>
>>>>>>>>> All proposed recommendations and requested changes have been 
>>>>>>>>> implemented. We are going to test it on Monday. You will get 
>>>>>>>>> the new tested patch soon.
>>>>>>>>>
>>>>>>>>> The attached code here just got compiled, no test executed 
>>>>>>>>> yet, it might contain a bug, but you can quickly review it and 
>>>>>>>>> send your comments.
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>> Michal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> // Wait until the next synchronous GC, a concurrent full gc 
>>>>>>>>> request, // or a timeout, whichever is earlier.
>>>>>>>>> void 
>>>>>>>>> ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
>>>>>>>>> t_millis) {
>>>>>>>>> // Wait time in millis or 0 value representing infinite wait 
>>>>>>>>> for a scavenge assert(t_millis>= 0, "Wait time for scavenge 
>>>>>>>>> should be 0 or positive");
>>>>>>>>>
>>>>>>>>> GenCollectedHeap* gch = GenCollectedHeap::heap(); double 
>>>>>>>>> start_time_secs = os::elapsedTime(); double end_time_secs = 
>>>>>>>>> start_time_secs + (t_millis / ((double) MILLIUNITS));
>>>>>>>>>
>>>>>>>>> // Total collections count before waiting loop unsigned int 
>>>>>>>>> before_count; { MutexLockerEx hl(Heap_lock, 
>>>>>>>>> Mutex::_no_safepoint_check_flag); before_count = 
>>>>>>>>> gch->total_collections(); }
>>>>>>>>>
>>>>>>>>> unsigned int loop_count = 0;
>>>>>>>>>
>>>>>>>>> while(!_should_terminate) {
>>>>>>>>> double now_time = os::elapsedTime(); long wait_time_millis;
>>>>>>>>>
>>>>>>>>> if(t_millis != 0) {
>>>>>>>>> // New wait limit
>>>>>>>>> wait_time_millis = (long) ((end_time_secs - now_time) * 
>>>>>>>>> MILLIUNITS); if(wait_time_millis<= 0) { // Wait time is over 
>>>>>>>>> break; } } else { // No wait limit, wait if necessary forever 
>>>>>>>>> wait_time_millis = 0; }
>>>>>>>>>
>>>>>>>>> // Wait until the next event or the remaining timeout { 
>>>>>>>>> MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
>>>>>>>>>
>>>>>>>>> set_CMS_flag(CMS_cms_wants_token); // to provoke notifies if 
>>>>>>>>> (_should_terminate || _collector->_full_gc_requested) { 
>>>>>>>>> return; } assert(t_millis == 0 || wait_time_millis>  0, 
>>>>>>>>> "Sanity"); CGC_lock->wait(Mutex::_no_safepoint_check_flag,
>>>>>>>>> wait_time_millis);
>>>>>>>>> clear_CMS_flag(CMS_cms_wants_token);
>>>>>>>>> assert(!CMS_flag_is_set(CMS_cms_has_token | 
>>>>>>>>> CMS_cms_wants_token), "Should not be set"); }
>>>>>>>>>
>>>>>>>>> // Extra wait time check before entering the heap lock to get 
>>>>>>>>> the collection count if(t_millis != 0&&  os::elapsedTime()>= 
>>>>>>>>> end_time_secs) { // Wait time is over break; }
>>>>>>>>>
>>>>>>>>> // Total collections count after the event unsigned int 
>>>>>>>>> after_count; { MutexLockerEx hl(Heap_lock, 
>>>>>>>>> Mutex::_no_safepoint_check_flag); after_count = 
>>>>>>>>> gch->total_collections(); }
>>>>>>>>>
>>>>>>>>> if(before_count != after_count) { // There was a collection - 
>>>>>>>>> success break; }
>>>>>>>>>
>>>>>>>>> // Too many loops warning
>>>>>>>>> if(++loop_count == 0) {
>>>>>>>>> warning("wait_on_cms_lock_for_scavenge() has looped %d times", 
>>>>>>>>> loop_count - 1); } } }
>>>>>>>>>
>>>>>>>>> void ConcurrentMarkSweepThread::sleepBeforeNextCycle() { while 
>>>>>>>>> (!_should_terminate) { if (CMSIncrementalMode) { icms_wait(); 
>>>>>>>>> if(CMSWaitDuration>= 0) { // Wait until the next synchronous 
>>>>>>>>> GC, a concurrent full gc // request or a timeout, whichever is 
>>>>>>>>> earlier.
>>>>>>>>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>>>>>>>> }
>>>>>>>>> return;
>>>>>>>>> } else {
>>>>>>>>> if(CMSWaitDuration>= 0) {
>>>>>>>>> // Wait until the next synchronous GC, a concurrent full gc // 
>>>>>>>>> request or a timeout, whichever is earlier.
>>>>>>>>> wait_on_cms_lock_for_scavenge(CMSWaitDuration);
>>>>>>>>> } else {
>>>>>>>>> // Wait until any cms_lock event not to call 
>>>>>>>>> shouldConcurrentCollect permanently wait_on_cms_lock(0); } } 
>>>>>>>>> // Check if we should start a CMS collection cycle if 
>>>>>>>>> (_collector->shouldConcurrentCollect()) { return; } // .. 
>>>>>>>>> collection criterion not yet met, let's go back // and wait 
>>>>>>>>> some more } }
>>>>>>>>>
>>>>>>>>> Od: hotspot-gc-dev-bounces at openjdk.java.net
>>>>>>>>> Komu: "Jon Masamitsu" jon.masamitsu at oracle.com,"John Cuthbertson"
>>>>>>>>> john.cuthbertson at oracle.com
>>>>>>>>> Kopie: hotspot-gc-dev at openjdk.java.net
>>>>>>>>> Datum: Thu, 6 Dec 2012 23:43:29 -0800
>>>>>>>>> Předmet: Re: RFR(S): 7189971: Implement CMSWaitDuration for 
>>>>>>>>> non-incremental mode of CMS
>>>>>>>>>
>>>>>>>>>> Hi John --
>>>>>>>>>>
>>>>>>>>>> wrt the changes posted, i see the intent of the code and 
>>>>>>>>>> agree with it. I have a few minor suggestions on the details 
>>>>>>>>>> of how it's implemented. My comments are inline below, 
>>>>>>>>>> interleaved with the code:
>>>>>>>>>>
>>>>>>>>>> 317 // Wait until the next synchronous GC, a concurrent full 
>>>>>>>>>> gc request,
>>>>>>>>>> 318 // or a timeout, whichever is earlier.
>>>>>>>>>> 319 void
>>>>>>>>>> ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
>>>>>>>>>> t_millis) {
>>>>>>>>>> 320 // Wait for any cms_lock event when timeout not specified
>>>>>>>>>> (0 millis)
>>>>>>>>>> 321 if (t_millis == 0) {
>>>>>>>>>> 322 wait_on_cms_lock(t_millis);
>>>>>>>>>> 323 return;
>>>>>>>>>> 324 }
>>>>>>>>>>
>>>>>>>>>> I'd completely avoid the special case above because it would 
>>>>>>>>>> miss the part about waiting for a scavenge, instead dealing 
>>>>>>>>>> with that case in the code in the loop below directly. The 
>>>>>>>>>> idea of the "0" value is not to ask that we return 
>>>>>>>>>> immediately, but that we wait, if necessary forever, for a 
>>>>>>>>>> scavenge. The "0" really represents the value infinity in 
>>>>>>>>>> that sense. This would be in keeping with our use of wait() 
>>>>>>>>>> with a "0" value for timeout at other places in the JVM as 
>>>>>>>>>> well, so it's consistent.
>>>>>>>>>>
>>>>>>>>>> 325
>>>>>>>>>> 326 GenCollectedHeap* gch = GenCollectedHeap::heap();
>>>>>>>>>> 327 double start_time = os::elapsedTime();
>>>>>>>>>> 328 double end_time = start_time + (t_millis / 1000.0);
>>>>>>>>>>
>>>>>>>>>> Note how, the end_time == start_time for the special case of 
>>>>>>>>>> t_millis == 0, so we need to treat that case specially below.
>>>>>>>>>>
>>>>>>>>>> 329
>>>>>>>>>> 330 // Total collections count before waiting loop
>>>>>>>>>> 331 unsigned int before_count;
>>>>>>>>>> 332 {
>>>>>>>>>> 333 MutexLockerEx hl(Heap_lock, 
>>>>>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>>>>>> 334 before_count = gch->total_collections();
>>>>>>>>>> 335 }
>>>>>>>>>>
>>>>>>>>>> Good.
>>>>>>>>>>
>>>>>>>>>> 336
>>>>>>>>>> 337 while (true) {
>>>>>>>>>> 338 double now_time = os::elapsedTime();
>>>>>>>>>> 339 long wait_time_millis = (long)((end_time - now_time) * 
>>>>>>>>>> 1000.0);
>>>>>>>>>> 340
>>>>>>>>>> 341 if (wait_time_millis<= 0) {
>>>>>>>>>> 342 // Wait time is over
>>>>>>>>>> 343 break;
>>>>>>>>>> 344 }
>>>>>>>>>>
>>>>>>>>>> Modify to:
>>>>>>>>>> if (t_millis != 0) {
>>>>>>>>>> if (wait_time_millis<= 0) {
>>>>>>>>>> // Wait time is over
>>>>>>>>>> break;
>>>>>>>>>> }
>>>>>>>>>> } else {
>>>>>>>>>> wait_time_millis = 0; // for use in wait() below }
>>>>>>>>>>
>>>>>>>>>> 345
>>>>>>>>>> 346 // Wait until the next event or the remaining timeout
>>>>>>>>>> 347 {
>>>>>>>>>> 348 MutexLockerEx x(CGC_lock, 
>>>>>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>>>>>> 349 if (_should_terminate || _collector->_full_gc_requested) 
>>>>>>>>>> {
>>>>>>>>>> 350 return;
>>>>>>>>>> 351 }
>>>>>>>>>> 352 set_CMS_flag(CMS_cms_wants_token); // to provoke notifies
>>>>>>>>>>
>>>>>>>>>> insert: assert(t_millis == 0 || wait_time_millis>  0, 
>>>>>>>>>> "Sanity");
>>>>>>>>>>
>>>>>>>>>> 353 CGC_lock->wait(Mutex::_no_safepoint_check_flag,
>>>>>>>>>> wait_time_millis);
>>>>>>>>>> 354 clear_CMS_flag(CMS_cms_wants_token);
>>>>>>>>>> 355 assert(!CMS_flag_is_set(CMS_cms_has_token | 
>>>>>>>>>> CMS_cms_wants_token),
>>>>>>>>>> 356 "Should not be set");
>>>>>>>>>> 357 }
>>>>>>>>>> 358
>>>>>>>>>> 359 // Extra wait time check before entering the heap lock to 
>>>>>>>>>> get the collection count
>>>>>>>>>> 360 if (os::elapsedTime()>= end_time) {
>>>>>>>>>> 361 // Wait time is over
>>>>>>>>>> 362 break;
>>>>>>>>>> 363 }
>>>>>>>>>>
>>>>>>>>>> Modify above wait time check to make an exception for t_miliis == 0:
>>>>>>>>>> // Extra wait time check before checking collection count if 
>>>>>>>>>> (t_millis != 0&&  os::elapsedTime()>= end_time) { // wait 
>>>>>>>>>> time exceeded break; }
>>>>>>>>>>
>>>>>>>>>> 364
>>>>>>>>>> 365 // Total collections count after the event
>>>>>>>>>> 366 unsigned int after_count;
>>>>>>>>>> 367 {
>>>>>>>>>> 368 MutexLockerEx hl(Heap_lock, 
>>>>>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>>>>>> 369 after_count = gch->total_collections();
>>>>>>>>>> 370 }
>>>>>>>>>> 371
>>>>>>>>>> 372 if (before_count != after_count) {
>>>>>>>>>> 373 // There was a collection - success
>>>>>>>>>> 374 break;
>>>>>>>>>> 375 }
>>>>>>>>>> 376 }
>>>>>>>>>> 377 }
>>>>>>>>>>
>>>>>>>>>> While it is true that we do not have a case where the method 
>>>>>>>>>> is called with a time of "0", I think we want that value to 
>>>>>>>>>> be treated correctly as "infinity". For the case where we do 
>>>>>>>>>> not want a wait at all, we should use a small positive value, 
>>>>>>>>>> like "1 ms" to signal that intent, i.e. 
>>>>>>>>>> -XX:CMSWaitDuration=1, reserving CMSWaitDuration=0 to signal 
>>>>>>>>>> infinity. (We could also do that by reserving negative values 
>>>>>>>>>> to signal infinity, but that would make the code in the loop 
>>>>>>>>>> a bit more fiddly.)
>>>>>>>>>>
>>>>>>>>>> As mentioned in my previous email, I'd like to see this 
>>>>>>>>>> tested with CMSWaitDuration set to 0, positive and negative 
>>>>>>>>>> values (if necessary, we can reject negative value settings), 
>>>>>>>>>> and with ExplicitGCInvokesConcurrent.
>>>>>>>>>>
>>>>>>>>>> Rest looks OK to me, although I am not sure how this behaves 
>>>>>>>>>> with iCMS, as I have forgotten that part of the code.
>>>>>>>>>>
>>>>>>>>>> Finally, in current code (before these changes) there are two 
>>>>>>>>>> callers of the former wait_for_cms_lock() method, one here in 
>>>>>>>>>> sleepBeforeNextCycle() and one from the precleaning loop.
>>>>>>>>>> I think the right thing has been done in terms of leaving the 
>>>>>>>>>> latter alone.
>>>>>>>>>>
>>>>>>>>>> It would be good if this were checked with 
>>>>>>>>>> CMSInitiatingOccupancy set to 0 (or a small value), 
>>>>>>>>>> CMSWaitDuration set to 0,
>>>>>>>>>> -+PromotionFailureALot and checking that (1) it does not 
>>>>>>>>>> -+deadlock (2)
>>>>>>>>>> CMS cycles start very soon after the end of a scavenge (and 
>>>>>>>>>> not at random times as Michal has observed earlier, although 
>>>>>>>>>> i am guessing that is difficult to test).
>>>>>>>>>> It would be good to repeat the above test with iCMS as well.
>>>>>>>>>>
>>>>>>>>>> thanks!
>>>>>>>>>> -- ramki
>>>>>>>>>>
>>>>>>>>>> On Thu, Dec 6, 2012 at 1:39 PM, Srinivas Ramakrishna wrote:
>>>>>>>>>>> Thanks Jon for the pointer:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Dec 6, 2012 at 1:06 PM, Jon Masamitsu wrote:
>>>>>>>>>>>> On 12/05/12 14:47, Srinivas Ramakrishna wrote:
>>>>>>>>>>>>> The high level idea looks correct. I'll look at the 
>>>>>>>>>>>>> details in a bit (seriously this time; sorry it dropped 
>>>>>>>>>>>>> off my plate last time I promised).
>>>>>>>>>>>>> Does anyone have a pointer to the related discussion 
>>>>>>>>>>>>> thread on this aias from earlier in the year, by chance, 
>>>>>>>>>>>>> so one could refresh one's memory of that discussion?
>>>>>>>>>>>> subj: CMSWaitDuration unstable behavior
>>>>>>>>>>>>
>>>>>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-
>>>>>>>>>>>> August/thread.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> also:
>>>>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-A
>>>>>>>>>>> ugust/004880.html
>>>>>>>>>>>
>>>>>>>>>>> On to it later this afternoon, and TTYL w/review.
>>>>>>>>>>> - ramki