CMS parallel initial mark

Fri Jun 7 21:52:16 UTC 2013

On 6/7/2013 12:09 PM, Hiroshi Yamauchi wrote:
> Thanks for creating that.
It's an old bug really.  For the partitioning during young allocation,
we can use

6990419 CMS: Remaining work for 6572569: consistently skewed work 
distribution in (long) re-mark pauses

Another old one.

Were these changes reviewed internally by any openjdk
hotspot members?

Jon

> On Thu, Jun 6, 2013 at 10:12 AM, Jon Masamitsu <jon.masamitsu at oracle.com>wrote:
>
>>   Hiroshi,
>>
>> The CR for this changeset will be 6412968.
>>
>> CMS: Long initial mark pauses
>>
>> Jon
>>
>>
>> On 5/30/13 10:56 AM, Hiroshi Yamauchi wrote:
>>
>> Thanks, Jon. Please let me know when you know more.
>>
>> On Wed, May 29, 2013 at 6:37 PM, Jon Masamitsu <jon.masamitsu at oracle.com> <jon.masamitsu at oracle.com> wrote:
>>
>>   Hiroshi,
>>
>> I'm still reviewing the changes but so far this looks
>> very promising.  I've patched you changes into a repository
>> and started running a few tests.  I've turned on
>>
>> -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways
>>
>> Thanks.
>>
>> Jon
>>
>>
>> On 5/28/2013 5:24 PM, Hiroshi Yamauchi wrote:
>>
>>   Hi,
>>
>> I'd like to have the following contributed if it makes sense.
>>
>> 1) Here's a patch (against a recent revision of the hsx/hotspot-gc repo):
>>
>>     http://cr.openjdk.java.net/~hiroshi/webrevs/cmsparinitmark/webrev.00/
>>
>> that implements a parallel version of the initial mark phase of the
>> CMS collector. It's relatively a straightforward parallelization of
>> the existing single-threaded code. With the above patch, I see about
>> ~3-6x speedup in the initial mark pause times.
>>
>> 2) Now, here's a related issue and a suggested fix/patch for it:
>>
>> I see that the initial mark and remark pause times sometimes spike
>> with a large young generation. For example, under a 1 GB young gen / 3
>> GB heap setting, they occasionally spike up to ~500 milliseconds from
>> the normal < 100 ms range, on my machine. As far as I can tell, this
>> happens when the eden is fairly occupied (> 700 MB full) and not
>> sufficiently divided up and the parallelism decreases (at the worst
>> case it becomes almost single-threaded.)
>>
>> Here's a suggested patch in a separate patch:
>>
>>     http://cr.openjdk.java.net/~hiroshi/webrevs/edenchunks/webrev.00/
>>
>> that attempts to improve on this issue by implementing an alternative
>> way of dividing up the eden into chunks for an increased parallelism
>> (or better load balancing between the GC threads) for the young gen
>> scan portion of the remark phase (and the now-parallelized initial
>> mark phase.) It uses a CAS-based mechanism that samples the object
>> boundaries in the eden space on the slow allocation code paths (eg. at
>> the TLAB refill and large object allocation times) at all times.
>>
>> This approach is in contrast to the original mechanism that samples
>> object boundaries in the eden space asynchronously during the preclean
>> phase. I think the reason that the above issue happens is that when
>> the young generation is large, a large portion of the eden space could
>> get filled/allocated outside of the preclean phase (or a concurrent
>> collection) and the object boundaries do not get sampled
>> often/regularly enough. Also, it isn't very suited for the parallel
>> initial mark because the initial mark phase isn't preceded by the
>> preclean phase unlike the remark phase. According to the Dacapo
>> benchmarks, this alternative sampling mechanism does not have
>> noticeable runtime overhead despite it is engaged at all times.
>>
>> With this patch, I see that the (parallel) initial mark and remark
>> pause times stay below 100 ms (no spikes) under the same setting.
>>
>> Both of these features/flags are disabled by default. You're welcome
>> to handle the two patches separately.
>>
>> Thanks,
>> Hiroshi
>>
>>
>>