G1 question: concurrent cleaning of dirty cards

Fri Jun 28 20:53:24 UTC 2013

Hi Igor,

Yeah G1 has that facility right now. In fact you added it. :) When the 
number of completed buffers is below the green zone upper limit, none of 
the refinement threads are refining buffers. That is the green zone 
upper limit is number of buffers that we expect to be able to process 
during the GC without it going over some percentage of the pause time (I 
think the default is 10%). When the number of buffers grows above the 
green zone upper limit, the refinement threads start processing the 
buffers in stepped manner.

So during the safepoint we would process N - green-zone-upper-limit 
completed buffers. In fact we could have a watcher task that monitors 
the number of completed buffers and triggers a safepoint when the number 
of completed buffers becomes sufficiently high - say above the 
yellow-zone upper limit.

That does away with the whole notion of concurrent refinement but will 
remove a lot of the nasty complicated code that gets executed by the 
mutators or refinement threads.

My main concern is that the we would be potentially  increasing the 
number and duration of non-GC safepoints which cause issues with latency 
sensitive apps. For those workloads that only care about 90% of the 
transactions this approach would probably be fine.

We would need to evaluate the performance of each approach.

The card cache delays the processing of cards that have been dirtied 
multiple times - so it does act kind of like a buffer reducing the 
potential for this issue.

JohnC

On 6/28/2013 12:47 PM, Igor Veresov wrote:
> The impact on the next collection however can be bounded. Say, if you 
> make it have a safepoint to reap the buffers when the number of buffer 
> reaches $n$, that alone would put a cap on the potential pause 
> incurred during the collection. The card cache currently has the same 
> effect, sort of, right?
>
> igor
>
> On Jun 28, 2013, at 12:26 PM, John Cuthbertson 
> <john.cuthbertson at oracle.com <mailto:john.cuthbertson at oracle.com>> wrote:
>
>> Hi Igor,
>>
>> On 6/28/2013 9:47 AM, Igor Veresov wrote:
>>>
>>> On Jun 28, 2013, at 7:08 AM, "Doerr, Martin" <martin.doerr at sap.com 
>>> <mailto:martin.doerr at sap.com>> wrote:
>>>
>>>> Hi Igor,
>>>> we didn’t find an easy and feasible way to ensure the ordering, either.
>>>> Grabbing the buffers and cleaning the cards at safepoints might be 
>>>> the best solution.
>>>
>>> Would anybody from the G1 team like to think about that?
>>
>> I've been thinking about this issue on an off for the last few weeks 
>> when I get the time. I mentioned it to Vladimir a couple of times to 
>> get his input.
>>
>>>> Maybe removing the barrier that flushes the store to the cardtable 
>>>> makes the problem more likely to occur.
>>>> I guess the purpose of the barrier was exactly to avoid this problem
>>>> (which should be working perfectly if the post barriers had 
>>>> StoreLoad barriers, too).
>>>
>>> Yeah, but like you noted that would have a horrific effect on 
>>> performance. So, it's probably best to bunch the work up to at least 
>>> eliminate the need of extra work when, say, you're looping and 
>>> storing to a limited working set (G1 uses the cardtable basically 
>>> for that purpose). The safepoint approach will likely require more 
>>> memory for buffers and the load will be spiky, and if the collection 
>>> were to happen right after we grabbed the buffers the collector will 
>>> have to process all of them which is not going to work well for 
>>> predictability. But nothing better comes to mind at this point.
>>> Btw, there are already periodic safepoints to do bias locking 
>>> revocations, so may be it would make sense to piggyback on that.
>>
>> Piggy backing on all the other safepoint operations might work if 
>> they happen frequently enough but I don't know if that 's the case. 
>> And as you, even then there will be times where we haven't had a 
>> safepoint for a while and will have a ton of buffers to process at 
>> the start of the pause.
>>
>> It might be worth adding a suitable memory barrier to the G1 post 
>> write barrier and evaluating the throughput hit.
>>
>> JohnC
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130628/c22b3da8/attachment.htm>