G1 question: concurrent cleaning of dirty cards
John Cuthbertson
john.cuthbertson at oracle.com
Fri Jun 28 23:06:43 UTC 2013
Hi Igor.
You misunderstood me. I meant that if we use safepoints to refine cards
all of the code that currently supports refinement by mutators can be
removed. That' all.
JohnC
On 6/28/2013 4:02 PM, Igor Veresov wrote:
> The mutator processing doesn't solve it. The card clearing event is
> still asynchronous with respect to possible mutations in other
> threads. While one mutator thread is processing buffers and clearing
> cards the other can sneak in and do the store to the same object that
> will go unnoticed. So I'm afraid it's either a store-load barrier, or
> we need to stop all mutator threads to prevent this race, or worse..
>
> igor
>
>
> On Jun 28, 2013, at 1:53 PM, John Cuthbertson
> <john.cuthbertson at oracle.com <mailto:john.cuthbertson at oracle.com>> wrote:
>
>> Hi Igor,
>>
>> Yeah G1 has that facility right now. In fact you added it. :) When
>> the number of completed buffers is below the green zone upper limit,
>> none of the refinement threads are refining buffers. That is the
>> green zone upper limit is number of buffers that we expect to be able
>> to process during the GC without it going over some percentage of the
>> pause time (I think the default is 10%). When the number of buffers
>> grows above the green zone upper limit, the refinement threads start
>> processing the buffers in stepped manner.
>>
>> So during the safepoint we would process N - green-zone-upper-limit
>> completed buffers. In fact we could have a watcher task that monitors
>> the number of completed buffers and triggers a safepoint when the
>> number of completed buffers becomes sufficiently high - say above the
>> yellow-zone upper limit.
>>
>> That does away with the whole notion of concurrent refinement but
>> will remove a lot of the nasty complicated code that gets executed by
>> the mutators or refinement threads.
>>
>> My main concern is that the we would be potentially increasing the
>> number and duration of non-GC safepoints which cause issues with
>> latency sensitive apps. For those workloads that only care about 90%
>> of the transactions this approach would probably be fine.
>>
>> We would need to evaluate the performance of each approach.
>>
>> The card cache delays the processing of cards that have been dirtied
>> multiple times - so it does act kind of like a buffer reducing the
>> potential for this issue.
>>
>> JohnC
>>
>> On 6/28/2013 12:47 PM, Igor Veresov wrote:
>>> The impact on the next collection however can be bounded. Say, if
>>> you make it have a safepoint to reap the buffers when the number of
>>> buffer reaches $n$, that alone would put a cap on the potential
>>> pause incurred during the collection. The card cache currently has
>>> the same effect, sort of, right?
>>>
>>> igor
>>>
>>> On Jun 28, 2013, at 12:26 PM, John Cuthbertson
>>> <john.cuthbertson at oracle.com <mailto:john.cuthbertson at oracle.com>>
>>> wrote:
>>>
>>>> Hi Igor,
>>>>
>>>> On 6/28/2013 9:47 AM, Igor Veresov wrote:
>>>>>
>>>>> On Jun 28, 2013, at 7:08 AM, "Doerr, Martin" <martin.doerr at sap.com
>>>>> <mailto:martin.doerr at sap.com>> wrote:
>>>>>
>>>>>> Hi Igor,
>>>>>> we didn’t find an easy and feasible way to ensure the ordering,
>>>>>> either.
>>>>>> Grabbing the buffers and cleaning the cards at safepoints might
>>>>>> be the best solution.
>>>>>
>>>>> Would anybody from the G1 team like to think about that?
>>>>
>>>> I've been thinking about this issue on an off for the last few
>>>> weeks when I get the time. I mentioned it to Vladimir a couple of
>>>> times to get his input.
>>>>
>>>>>> Maybe removing the barrier that flushes the store to the
>>>>>> cardtable makes the problem more likely to occur.
>>>>>> I guess the purpose of the barrier was exactly to avoid this problem
>>>>>> (which should be working perfectly if the post barriers had
>>>>>> StoreLoad barriers, too).
>>>>>
>>>>> Yeah, but like you noted that would have a horrific effect on
>>>>> performance. So, it's probably best to bunch the work up to at
>>>>> least eliminate the need of extra work when, say, you're looping
>>>>> and storing to a limited working set (G1 uses the cardtable
>>>>> basically for that purpose). The safepoint approach will likely
>>>>> require more memory for buffers and the load will be spiky, and if
>>>>> the collection were to happen right after we grabbed the buffers
>>>>> the collector will have to process all of them which is not going
>>>>> to work well for predictability. But nothing better comes to mind
>>>>> at this point.
>>>>> Btw, there are already periodic safepoints to do bias locking
>>>>> revocations, so may be it would make sense to piggyback on that.
>>>>
>>>> Piggy backing on all the other safepoint operations might work if
>>>> they happen frequently enough but I don't know if that 's the case.
>>>> And as you, even then there will be times where we haven't had a
>>>> safepoint for a while and will have a ton of buffers to process at
>>>> the start of the pause.
>>>>
>>>> It might be worth adding a suitable memory barrier to the G1 post
>>>> write barrier and evaluating the throughput hit.
>>>>
>>>> JohnC
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130628/882bbcb1/attachment.htm>
More information about the hotspot-gc-dev
mailing list