CMS parallel initial mark

Fri Jun 7 11:23:14 UTC 2013

Hi all,

  some notes about the second patch, the modified eden boundary
sampling; in general I think the idea is good (Jon suggested that it
should become default), but imho there are some issues with the
implementation.

On Tue, 2013-05-28 at 17:24 -0700, Hiroshi Yamauchi wrote:
> Hi,
> 
> I'd like to have the following contributed if it makes sense.
> 
>
> 2) Now, here's a related issue and a suggested fix/patch for it:
> 
> I see that the initial mark and remark pause times sometimes spike
> with a large young generation. For example, under a 1 GB young gen / 3
> GB heap setting, they occasionally spike up to ~500 milliseconds from
> the normal < 100 ms range, on my machine. As far as I can tell, this
> happens when the eden is fairly occupied (> 700 MB full) and not
> sufficiently divided up and the parallelism decreases (at the worst
> case it becomes almost single-threaded.)
> 
> Here's a suggested patch in a separate patch:
> 
>   http://cr.openjdk.java.net/~hiroshi/webrevs/edenchunks/webrev.00/
> 
> that attempts to improve on this issue by implementing an alternative
> way of dividing up the eden into chunks for an increased parallelism
> (or better load balancing between the GC threads) for the young gen
> scan portion of the remark phase (and the now-parallelized initial
> mark phase.) It uses a CAS-based mechanism that samples the object
> boundaries in the eden space on the slow allocation code paths (eg. at
> the TLAB refill and large object allocation times) at all times.

- this patch is for DefNew/CMS only it seems. Is this correct?

- in the Hotspot code base explicit != NULL checks are done for whatever
reason. To keep the same style it would be nice to update the checks
whether to do the sampling ("if (CMSEdenChunksRecordAlways &&
_next_gen)") accordingly.

(the next point is not an issue of the patch)
- the check whether there is a _next_gen is odd - I do not think that
DefNew works as a generation without an older generation, but I'm not
sure.
Looking at other similar code checking for _next_gen != NULL, the
response is typically to update _next_gen and then asserting that
_next_gen is != NULL. E.g.

  if (_next_gen == NULL) {
    GenCollectedHeap* gch = GenCollectedHeap::heap();
    _next_gen = gch->next_gen(this);
    assert(_next_gen != NULL,
           "This must be the youngest gen, and not the only gen");
  }

Jon?

- I think CMSCollector::_eden_chunk_sampling_active should be volatile
to avoid any clever compilers skipping the read in sample_eden_chunk().
(Although I don't think it's possible here). See also below for avoiding
that variable completely.

- in the CMSCollector::CMSCollector() initialization list, the
initialization of _eden_chunk_sampling_active breaks the "-- ditto --"
list, which looks strange. (Actually I think the comment that a variable
may eventually be set in the ctor again later seems questionable anyway)

- instead of checking whether _eden_chunk_array is NULL to detect
whether you can sample I would prefer if the same predicate as in the
initialization (gch->supports_inline_contig_alloc()) were used.
(I think the change just copied what was done in the previous version of
concurrentMarkSweepGeneration.cpp:4515 has been done)

Maybe create a new predicate for that, used everywhere.

(btw, throughout this code there is a lot of checking whether some
NEW_C_HEAP_ARRAY returned NULL or not, and does elaborate recovery -
however by default NEW_C_HEAP_ARRAY fails with an OOM anyway... - but
that is certainly out of scope)

Same for checking whether _survivor_chunk_array is NULL - at the moment
sometimes _survivor_chunk_array is checked against NULL to avoid
entering code (in print_eden_and_survivor_chunk_arrays()).

The condition that could be used is the one that triggers allocation of
_survivor_chunk_array: (CMSParallelRemarkEnabled &&
CMSParallelSurvivorRemarkEnabled) in
concurrentMarkSweepGeneration.cpp:728.

- the code to get the lock on _eden_chunk_sampling_active seems to be
needlessly complicated.

Please consider using a dedicated Monitor in conjunction with
try_lock(). I.e.

in CMSCollector add a

Monitor * _cms_eden_sampling_lock; // initialize in the constructor

and then

if (_cms_eden_sampling_lock->try_lock()) {

  // do sampling

  _cms_eden_sampling_lock->unlock();
}

Monitor::try_lock() boils down to the exactly same code, uses a CAS to
obtain the lock. I cannot see a reason to roll an own lock
implementation here.

Hiroshi, is it possible for you to take another look at this change? It
would help us a lot. If there are questions, please do not hesitate to
ask.

Thanks a lot for your contributions so far,
Thomas