<div dir="ltr"><div>Hi,</div><div><br></div><div>I've seen a behavior with CMS precleaning which results in less deterministic remark pauses. Some background...</div><div><br></div><div>CMS is precleaning and attempts to schedule the remark phase when eden occupancy has reached a configurable percentage - default is 50%. If not enough cards were precleaned (<100 cards) during an iteration, the phase backs off and sleeps for a configurable number of millis - default 100ms.</div>
<div><br></div><div>The problem I've found occurs when not enough cards were cleaned, and we have reached the stop-precleaning-threshold - then sleep of 100ms occurs which mean that eden fills up for 100ms and even more cards get dirty - cards which the concurrent precleaning will <u>not</u> deal with. Then we exit precleaning and schedule remark. </div>
<div><br></div><div>So my suggestion to reduce this risk of indeterminism is that if the predicate (workdone < CMSAbortablePrecleanMinWorkPerIteration) is true, then we also check if we should continue precleaning by invoking should_abort_preclean() - if we should stop, we don't sleep either - we simply exit precleaning and schedule remark phase. If we shouldn't stop, then we sleep.</div>
<div><br></div><div>I've tried to reduce the configurable values of time-to-sleep and win-work-per-precleaning-iteration, and this yields significantly more deterministic remark pauses. According to the logs, indetermism occurs when not enough work was done and we should exit precleaning. However, I don't think it is a good idea to let the precleaning spin and not doing much.</div>
<div><br></div><div>So, what do you say - should we fix this? Below is the code and my comments are in <b><font color="#ff0000">red</font></b>.</div><div><br></div><div><br></div><div>Best Regards,</div><div>Gustav Åkesson</div>
<div><br></div><div>----------------</div><div><br></div><div><h3 style="font-size:13px;color:rgb(0,0,0);font-family:'DejaVu Sans','Bitstream Vera Sans','Luxi Sans',Verdana,sans-serif;line-height:15px">
concurrentMarkSweepGeneration.cpp</h3></div><div><br></div><div>// Try and schedule the remark such that young gen</div><div>// occupancy is CMSScheduleRemarkEdenPenetration %.</div><div>void CMSCollector::abortable_preclean() {</div>
<div> check_correct_thread_executing();</div><div> assert(CMSPrecleaningEnabled, "Inconsistent control state");</div><div> assert(_collectorState == AbortablePreclean, "Inconsistent control state");</div>
<div><br></div><div> // If Eden's current occupancy is below this threshold,</div><div> // immediately schedule the remark; else preclean</div><div> // past the next scavenge in an effort to</div><div> // schedule the pause as described avove. By choosing</div>
<div> // CMSScheduleRemarkEdenSizeThreshold >= max eden size</div><div> // we will never do an actual abortable preclean cycle.</div><div> if (get_eden_used() > CMSScheduleRemarkEdenSizeThreshold) {</div><div> TraceCPUTime tcpu(PrintGCDetails, true, gclog_or_tty);</div>
<div> CMSPhaseAccounting pa(this, "abortable-preclean", _gc_tracer_cm->gc_id(), !PrintGCDetails);</div><div> // We need more smarts in the abortable preclean</div><div> // loop below to deal with cases where allocation</div>
<div> // in young gen is very very slow, and our precleaning</div><div> // is running a losing race against a horde of</div><div> // mutators intent on flooding us with CMS updates</div><div> // (dirty cards).</div>
<div> // One, admittedly dumb, strategy is to give up</div><div> // after a certain number of abortable precleaning loops</div><div> // or after a certain maximum time. We want to make</div><div> // this smarter in the next iteration.</div>
<div> // XXX FIX ME!!! YSR</div><div> size_t loops = 0, workdone = 0, cumworkdone = 0, waited = 0;</div><div> while (!(should_abort_preclean() ||</div><div> ConcurrentMarkSweepThread::should_terminate())) {</div>
<div> <b>workdone = preclean_work(CMSPrecleanRefLists2, CMSPrecleanSurvivors2); <font color="#ff0000">// Here we do some "heavy" work</font></b></div><div> cumworkdone += workdone;</div><div> loops++;</div>
<div> // Voluntarily terminate abortable preclean phase if we have</div><div> // been at it for too long.</div><div> if ((CMSMaxAbortablePrecleanLoops != 0) &&</div><div> loops >= CMSMaxAbortablePrecleanLoops) {</div>
<div> if (PrintGCDetails) {</div><div> gclog_or_tty->print(" CMS: abort preclean due to loops ");</div><div> }</div><div> break;</div><div> }</div><div> if (pa.wallclock_millis() > CMSMaxAbortablePrecleanTime) {</div>
<div> if (PrintGCDetails) {</div><div> gclog_or_tty->print(" CMS: abort preclean due to time ");</div><div> }</div><div> break;</div><div> }</div><div> // If we are doing little work each iteration, we should</div>
<div> // take a short break.</div><div> <b><font color="#ff0000">// Here we take a break when not enough work was done.</font></b></div><div> if (workdone < CMSAbortablePrecleanMinWorkPerIteration <font color="#ff0000"><b>/* And here we should check if we should exit precleaning */</b></font>) {</div>
<div> // Sleep for some time, waiting for work to accumulate</div><div> stopTimer();</div><div> cmsThread()->wait_on_cms_lock(CMSAbortablePrecleanWaitMillis);</div><div> startTimer();</div><div>
waited++;</div><div> }</div><div> }</div><div> if (PrintCMSStatistics > 0) {</div><div> gclog_or_tty->print(" [%d iterations, %d waits, %d cards)] ",</div><div> loops, waited, cumworkdone);</div>
<div> }</div><div> }</div><div> CMSTokenSync x(true); // is cms thread</div><div> if (_collectorState != Idling) {</div><div> assert(_collectorState == AbortablePreclean,</div><div> "Spontaneous state transition?");</div>
<div> _collectorState = FinalMarking;</div><div> } // Else, a foreground collection completed this CMS cycle.</div><div> return;</div><div>}</div></div>