CMS abortable preclean

Sun Jul 27 15:34:14 UTC 2014

Hi,

I've seen a behavior with CMS precleaning which results in less
deterministic remark pauses. Some background...

CMS is precleaning and attempts to schedule the remark phase when eden
occupancy has reached a configurable percentage - default is 50%. If not
enough cards were precleaned (<100 cards) during an iteration, the phase
backs off and sleeps for a configurable number of millis - default 100ms.

The problem I've found occurs when not enough cards were cleaned, and we
have reached the stop-precleaning-threshold - then sleep of 100ms occurs
which mean that eden fills up for 100ms and even more cards get dirty -
cards which the concurrent precleaning will *not* deal with. Then we exit
precleaning and schedule remark.

So my suggestion to reduce this risk of indeterminism is that if the
predicate (workdone < CMSAbortablePrecleanMinWorkPerIteration) is true,
then we also check if we should continue precleaning by invoking
should_abort_preclean() - if we should stop, we don't sleep either - we
simply exit precleaning and schedule remark phase. If we shouldn't stop,
then we sleep.

I've tried to reduce the configurable values of time-to-sleep and
win-work-per-precleaning-iteration, and this yields significantly more
deterministic remark pauses. According to the logs, indetermism occurs when
not enough work was done and we should exit precleaning. However, I don't
think it is a good idea to let the precleaning spin and not doing much.

So, what do you say - should we fix this? Below is the code and my comments
are in *red*.

Best Regards,
Gustav Åkesson

----------------

concurrentMarkSweepGeneration.cpp

// Try and schedule the remark such that young gen
// occupancy is CMSScheduleRemarkEdenPenetration %.
void CMSCollector::abortable_preclean() {
  check_correct_thread_executing();
  assert(CMSPrecleaningEnabled,  "Inconsistent control state");
  assert(_collectorState == AbortablePreclean, "Inconsistent control
state");

  // If Eden's current occupancy is below this threshold,
  // immediately schedule the remark; else preclean
  // past the next scavenge in an effort to
  // schedule the pause as described avove. By choosing
  // CMSScheduleRemarkEdenSizeThreshold >= max eden size
  // we will never do an actual abortable preclean cycle.
  if (get_eden_used() > CMSScheduleRemarkEdenSizeThreshold) {
    TraceCPUTime tcpu(PrintGCDetails, true, gclog_or_tty);
    CMSPhaseAccounting pa(this, "abortable-preclean",
_gc_tracer_cm->gc_id(), !PrintGCDetails);
    // We need more smarts in the abortable preclean
    // loop below to deal with cases where allocation
    // in young gen is very very slow, and our precleaning
    // is running a losing race against a horde of
    // mutators intent on flooding us with CMS updates
    // (dirty cards).
    // One, admittedly dumb, strategy is to give up
    // after a certain number of abortable precleaning loops
    // or after a certain maximum time. We want to make
    // this smarter in the next iteration.
    // XXX FIX ME!!! YSR
    size_t loops = 0, workdone = 0, cumworkdone = 0, waited = 0;
    while (!(should_abort_preclean() ||
             ConcurrentMarkSweepThread::should_terminate())) {
      *workdone = preclean_work(CMSPrecleanRefLists2,
CMSPrecleanSurvivors2); // Here we do some "heavy" work*
      cumworkdone += workdone;
      loops++;
      // Voluntarily terminate abortable preclean phase if we have
      // been at it for too long.
      if ((CMSMaxAbortablePrecleanLoops != 0) &&
          loops >= CMSMaxAbortablePrecleanLoops) {
        if (PrintGCDetails) {
          gclog_or_tty->print(" CMS: abort preclean due to loops ");
        }
        break;
      }
      if (pa.wallclock_millis() > CMSMaxAbortablePrecleanTime) {
        if (PrintGCDetails) {
          gclog_or_tty->print(" CMS: abort preclean due to time ");
        }
        break;
      }
      // If we are doing little work each iteration, we should
      // take a short break.
      *// Here we take a break when not enough work was done.*
      if (workdone < CMSAbortablePrecleanMinWorkPerIteration */* And here
we should check if we should exit precleaning */*) {
        // Sleep for some time, waiting for work to accumulate
        stopTimer();
        cmsThread()->wait_on_cms_lock(CMSAbortablePrecleanWaitMillis);
        startTimer();
        waited++;
      }
    }
    if (PrintCMSStatistics > 0) {
      gclog_or_tty->print(" [%d iterations, %d waits, %d cards)] ",
                          loops, waited, cumworkdone);
    }
  }
  CMSTokenSync x(true); // is cms thread
  if (_collectorState != Idling) {
    assert(_collectorState == AbortablePreclean,
           "Spontaneous state transition?");
    _collectorState = FinalMarking;
  } // Else, a foreground collection completed this CMS cycle.
  return;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20140727/6f8e2cb6/attachment.htm>