CMS parallel initial mark

Thu May 30 01:37:49 UTC 2013

Hiroshi,

I'm still reviewing the changes but so far this looks
very promising.  I've patched you changes into a repository
and started running a few tests.  I've turned on

-XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways

Thanks.

Jon

On 5/28/2013 5:24 PM, Hiroshi Yamauchi wrote:
> Hi,
>
> I'd like to have the following contributed if it makes sense.
>
> 1) Here's a patch (against a recent revision of the hsx/hotspot-gc repo):
>
>    http://cr.openjdk.java.net/~hiroshi/webrevs/cmsparinitmark/webrev.00/
>
> that implements a parallel version of the initial mark phase of the
> CMS collector. It's relatively a straightforward parallelization of
> the existing single-threaded code. With the above patch, I see about
> ~3-6x speedup in the initial mark pause times.
>
> 2) Now, here's a related issue and a suggested fix/patch for it:
>
> I see that the initial mark and remark pause times sometimes spike
> with a large young generation. For example, under a 1 GB young gen / 3
> GB heap setting, they occasionally spike up to ~500 milliseconds from
> the normal < 100 ms range, on my machine. As far as I can tell, this
> happens when the eden is fairly occupied (> 700 MB full) and not
> sufficiently divided up and the parallelism decreases (at the worst
> case it becomes almost single-threaded.)
>
> Here's a suggested patch in a separate patch:
>
>    http://cr.openjdk.java.net/~hiroshi/webrevs/edenchunks/webrev.00/
>
> that attempts to improve on this issue by implementing an alternative
> way of dividing up the eden into chunks for an increased parallelism
> (or better load balancing between the GC threads) for the young gen
> scan portion of the remark phase (and the now-parallelized initial
> mark phase.) It uses a CAS-based mechanism that samples the object
> boundaries in the eden space on the slow allocation code paths (eg. at
> the TLAB refill and large object allocation times) at all times.
>
> This approach is in contrast to the original mechanism that samples
> object boundaries in the eden space asynchronously during the preclean
> phase. I think the reason that the above issue happens is that when
> the young generation is large, a large portion of the eden space could
> get filled/allocated outside of the preclean phase (or a concurrent
> collection) and the object boundaries do not get sampled
> often/regularly enough. Also, it isn't very suited for the parallel
> initial mark because the initial mark phase isn't preceded by the
> preclean phase unlike the remark phase. According to the Dacapo
> benchmarks, this alternative sampling mechanism does not have
> noticeable runtime overhead despite it is engaged at all times.
>
> With this patch, I see that the (parallel) initial mark and remark
> pause times stay below 100 ms (no spikes) under the same setting.
>
> Both of these features/flags are disabled by default. You're welcome
> to handle the two patches separately.
>
> Thanks,
> Hiroshi