CMSWaitDuration unstable behavior

Jon Masamitsu jon.masamitsu at oracle.com
Wed Aug 8 23:45:18 UTC 2012



On 8/8/2012 11:56 AM, Srinivas Ramakrishna wrote:
> ...
>
> PS: Jon, if Michal takes the approach of CMSScavengeBeforeInitialMark, I'd
> say it would be useful to the broader community (not
> just ICMS users) if that were integrated into the main-line code, as it
> would be a via-media for CMS scaling in the absence of the
> piggybacking RFE which is really the best solution here.

Agreed.


Jon

> thanks!
> -- ramki
>
> On Wed, Aug 8, 2012 at 8:11 AM, Jon Masamitsu<jon.masamitsu at oracle.com>wrote:
>
>> Michal,
>>
>> The engineer with the most experience on CMS left Oracle
>> and  I suspect this is not going to get fixed in the way you want.
>>
>> I've create CR 7189971 to capture your comments and it will be
>> reviewed along with other RFE's for CMS but I would not be
>> optimistic.
>>
>> Since you are customizing your own VM, did you consider
>> explicitly invoking a young collection before the initial mark
>> the way that it is done for the remark phase with the flag
>>
>> CMSScavengeBeforeRemark
>>
>> Jon
>>
>>
>> On 8/7/2012 6:16 AM, Frajt, Michal wrote:
>>
>>> Hi all,
>>>
>>> We are using the incremental CMS collector for many years. We have a
>>> distributed application framework based on the subscribe-unsubscribe model
>>> where the data unsubscriptions are handled by the application layer just
>>> forgetting the strong reference to the distributed data. The underlying
>>> application framework layer is using weak references to trace the data
>>> requirement from the application layer. We keep the old generation
>>> processed permanently (incrementally) to get the week references released
>>> and reported within a short period of time (minutes).
>>>
>>> Unfortunately the incremental mode is missing the support for the
>>> CMSWaitDuration to place the initial mark phase right after the young space
>>> collection. With some new gen sizing optimization we went to a situation
>>> when the new gen is more or less big enough to keep the most of live
>>> objects with only a few promotions to the old gen. The incremental CMS is
>>> then started every minute in a random moment with pretty garbaged new gen.
>>> The initial mark takes 20-50 times more than a single new gen processing
>>> (40ms new gen, initial mark 1100ms).
>>>
>>> We decided to customize the OpenJDK 6 by adding the incremental mode
>>> CMSWaitDuration support. We took the same approach as the wait_on_cms_lock
>>> method does with the CGC_lock object. Unfortunately we realized that the
>>> CGC_lock mutex is additionally notified in some other situation than the
>>> young space collection finishing. The young space collection unrelated
>>> notifications are coming from the desynchronize method invocations. These
>>> unrelated notifications are causing the wait_on_cms_lock to return earlier
>>> than required. The initial mark phase is started before the young space
>>> collection even there is enough wait duration time specified to wait. We
>>> have fixed it by waiting again if the GenCollectedHeap::heap()->**total_collections()
>>> counter is not changed after the CGC_long->wait method returns but not
>>> longer than the CMSWaitDuration in total. The initial mark is then always
>>> placed (if CMSWaitDuration is long enough) after the young space
>>> collection. Every initial mark phase takes no longer than 17ms (previously
>>> 1100ms).
>>>
>>> We tested the CMSWaitDuration behavior in the normal CMS mode. We
>>> specified the -XX:+**UseCMSInitiatingOccupancyOnly and -XX:**
>>> CMSInitiatingOccupancyFraction**=10 to force the CMS running permanently
>>> (shouldConcurrentCollect should be returning true). The CMS initial-mark is
>>> many times started without waiting for the young space collection which
>>> makes the initial marking running 20-50 longer. We find this as unstable
>>> behavior of the CMSWaitDuration implementation related to the problem of
>>> the wait-notify signaling on the CGC_lock object. We disabled the explicit
>>> GC invocation (-XX:+DisableExplicitGC) to be sure there is no other reason
>>> to start the CMS initial mark phase before the young space collection.
>>>
>>> Is there any plan to get the CMSWaitDuration supported in the incremental
>>> mode and/or get it fixed in the normal mode?
>>>
>>> Thanks,
>>> Michal Frajt
>>>
>>>
>>>



More information about the hotspot-gc-dev mailing list