RFR(S): 7189971: Implement CMSWaitDuration for non-incremental mode of CMS

Fri Dec 7 07:43:29 UTC 2012

Hi John --

wrt the changes posted, i see the intent of the code and agree with
it. I have a few minor suggestions on the
details of how it's implemented. My comments are inline below,
interleaved with the code:

 317 // Wait until the next synchronous GC, a concurrent full gc request,
 318 // or a timeout, whichever is earlier.
 319 void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long
t_millis) {
 320   // Wait for any cms_lock event when timeout not specified (0 millis)
 321   if (t_millis == 0) {
 322     wait_on_cms_lock(t_millis);
 323     return;
 324   }

I'd completely avoid the special case above because it would miss the
part about waiting for a
scavenge, instead dealing with that case in the code in the loop below
directly. The idea
of the "0" value is not to ask that we return immediately, but that we
wait, if necessary
forever, for a scavenge. The "0" really represents the value infinity
in that sense. This would
be in keeping with our use of wait() with a "0" value for timeout at
other places in the JVM as
well, so it's consistent.

 325
 326   GenCollectedHeap* gch = GenCollectedHeap::heap();
 327   double start_time = os::elapsedTime();
 328   double end_time = start_time + (t_millis / 1000.0);

Note how, the end_time == start_time for the special case of t_millis
== 0, so we need to treat that
case specially below.

 329
 330   // Total collections count before waiting loop
 331   unsigned int before_count;
 332   {
 333     MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
 334     before_count = gch->total_collections();
 335   }

Good.

 336
 337   while (true) {
 338     double now_time = os::elapsedTime();
 339     long wait_time_millis = (long)((end_time - now_time) * 1000.0);
 340
 341     if (wait_time_millis <= 0) {
 342       // Wait time is over
 343       break;
 344     }

Modify to:
           if (t_millis != 0) {
              if  (wait_time_millis <= 0)  {
                 // Wait time is over
                 break;
             }
          } else {
             wait_time_millis = 0;     // for use in wait() below
         }

 345
 346     // Wait until the next event or the remaining timeout
 347     {
 348       MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
 349       if (_should_terminate || _collector->_full_gc_requested) {
 350         return;
 351       }
 352       set_CMS_flag(CMS_cms_wants_token);   // to provoke notifies

insert:     assert(t_millis == 0 || wait_time_millis > 0, "Sanity");

 353       CGC_lock->wait(Mutex::_no_safepoint_check_flag, wait_time_millis);
 354       clear_CMS_flag(CMS_cms_wants_token);
 355       assert(!CMS_flag_is_set(CMS_cms_has_token | CMS_cms_wants_token),
 356              "Should not be set");
 357     }
 358
 359     // Extra wait time check before entering the heap lock to get
the collection count
 360     if (os::elapsedTime() >= end_time) {
 361       // Wait time is over
 362       break;
 363     }

Modify above wait time check to make an exception for t_miliis == 0:
           // Extra wait time check before checking collection count
           if (t_millis != 0 && os::elapsedTime() >= end_time) {
              // wait time exceeded
              break;
           }

 364
 365     // Total collections count after the event
 366     unsigned int after_count;
 367     {
 368       MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
 369       after_count = gch->total_collections();
 370     }
 371
 372     if (before_count != after_count) {
 373       // There was a collection - success
 374       break;
 375     }
 376   }
 377 }

While it is true that we do not have a case where the method is called
with a time of "0", I think we
want that value to be treated correctly as "infinity". For the case
where we do not want a wait at all,
we should use a small positive value, like "1 ms" to signal that
intent, i.e. -XX:CMSWaitDuration=1,
reserving CMSWaitDuration=0 to signal infinity. (We could also do that
by reserving negative values to
signal infinity, but that would make the code in the loop a bit more fiddly.)

As mentioned in my previous email, I'd like to see this tested with
CMSWaitDuration set to 0, positive and
negative values (if necessary, we can reject negative value settings),
and with ExplicitGCInvokesConcurrent.

Rest looks OK to me, although I am not sure how this behaves with
iCMS, as I have forgotten that part of the
code.

Finally, in current code (before these changes) there are two callers
of the former wait_for_cms_lock() method,
one here in sleepBeforeNextCycle() and one from the precleaning loop.
I think the right thing has been done
in terms of leaving the latter alone.

It would be good if this were checked with CMSInitiatingOccupancy set
to 0 (or a small value), CMSWaitDuration set to 0,
-+PromotionFailureALot and checking that (1) it does not deadlock (2)
CMS cycles start very soon after the end of
a scavenge (and not at random times as Michal has observed earlier,
although i am guessing that is difficult to test).
It would be good to repeat the above test with iCMS as well.

thanks!
-- ramki

On Thu, Dec 6, 2012 at 1:39 PM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>
> Thanks Jon for the pointer:
>
>
> On Thu, Dec 6, 2012 at 1:06 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
>>
>>
>>
>> On 12/05/12 14:47, Srinivas Ramakrishna wrote:
>>>
>>> The high level idea looks correct. I'll look at the details in a bit (seriously this time; sorry it dropped off my plate last time I promised).
>>> Does anyone have a pointer to the related discussion thread on this aias from earlier in the year, by chance, so one could refresh one's
>>> memory of that discussion?
>>
>>
>> subj: CMSWaitDuration unstable behavior
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/thread.html
>>
>>
>
> also: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-August/004880.html
>
> On to it later this afternoon, and TTYL w/review.
> - ramki