G1 question: concurrent cleaning of dirty cards

Fri Jun 28 16:47:36 UTC 2013

On Jun 28, 2013, at 7:08 AM, "Doerr, Martin" <martin.doerr at sap.com> wrote:

> Hi Igor,
>  
> we didn’t find an easy and feasible way to ensure the ordering, either.
> Grabbing the buffers and cleaning the cards at safepoints might be the best solution.

Would anybody from the G1 team like to think about that?

>  
> Maybe removing the barrier that flushes the store to the cardtable makes the problem more likely to occur.
> I guess the purpose of the barrier was exactly to avoid this problem
> (which should be working perfectly if the post barriers had StoreLoad barriers, too).
>  

Yeah, but like you noted that would have a horrific effect on performance. So, it's probably best to bunch the work up to at least eliminate the need of extra work when, say, you're looping and storing to a limited working set (G1 uses the cardtable basically for that purpose). The safepoint approach will likely require more memory for buffers and the load will be spiky, and if the collection were to happen right after we grabbed the buffers the collector will have to process all of them which is not going to work well for predictability. But nothing better comes to mind at this point.
Btw, there are already periodic safepoints to do bias locking revocations, so may be it would make sense to piggyback on that.  

igor

> Best regards,
> Martin
>  
>  
> From: Igor Veresov [mailto:iggy.veresov at gmail.com] 
> Sent: Donnerstag, 27. Juni 2013 16:13
> To: Doerr, Martin
> Cc: John Cuthbertson; hotspot-gc-dev at openjdk.java.net; Braun, Matthias
> Subject: Re: G1 question: concurrent cleaning of dirty cards
>  
> Yea, unless I'm forgetting something, this seems very fundamental. The probability of this happening is probably greatly reduced by the card cache, nevertheless it seems possible. The only solution that comes to mind is do periodic safepoint to grab the already filled buffers and clean the corresponding entries in the card table. The processing of the grabbed buffers may of course be done concurrently.
>  
> But there doesn't seem to be an easy way to ensure the ordering between the original store to the object and the cleaning store to the card table.  The barrier that flushes the store to the cardtable doesn't in any way enforce the original store to the object from the other thread to happen before that. So the failure case would be this:
>  
> Mutator thread: 
> - store to object                                       
> - load from cardtable
> - compare the cardtable byte (it is dirty)
> - bail from barrier
>  
> Refinement thread:
> - clear the card
>  
> If clearing of the card occurs after the mutator loads the byte from the cardtable, the mutator won't enqueue the card, which is sort of the intended behavior. But there is no guarantee that the refinement thread would see the result of "store to object", in which case the information will be lost.
>  
> igor
>  
> On Jun 27, 2013, at 1:27 AM, "Doerr, Martin" <martin.doerr at sap.com> wrote:
> 
> 
> Hi Igor,
>  
> we have seen crashes while testing our hotspot 23 based SAPJVM with G1.
> However, there’s no evidence that these crashes are caused by this problem.
> We basically found it by reading code.
>  
> Best regards,
> Martin
>  
> From: Igor Veresov [mailto:iggy.veresov at gmail.com] 
> Sent: Donnerstag, 27. Juni 2013 08:15
> To: Doerr, Martin
> Cc: John Cuthbertson; hotspot-gc-dev at openjdk.java.net; Braun, Matthias
> Subject: Re: G1 question: concurrent cleaning of dirty cards
>  
> Oh, re-read your letter, yup, there seems to be a problem. Have you observed that in practice?
>  
> igor
>  
> On Jun 26, 2013, at 9:27 PM, Igor Veresov <iggy.veresov at gmail.com> wrote:
> 
> 
> 
> The cards that are stored in the buffers are not available for concurrent processing right when they are enqueued. Instead they are passed to the processing threads when the buffer fills up. This passing of the buffer involves signaling of a condition (like pthread_cond_signal(), literally) that has a write barrier for sure, which would guarantee that the cards in the buffer, and contents of the card table, and the contents of the object are "in sync".
>  
> The only place in the generated code where there has to be a store-store barrier (for non-TSO architectures) is between the actual field store and the dirtying of the card.
>  
> Does this make sense?
>  
> igor
>  
>  
> On May 23, 2013, at 6:12 AM, "Doerr, Martin" <martin.doerr at sap.com> wrote:
> 
> 
> 
> Hi John,
>  
> thank you very much for your comments. Your last line explains exactly what we are concerned about.
> Does anybody plan to prevent this situation?
> I don’t want to propose adding StoreLoad barriers in all G1 post barriers because I’d expect undesired performance impact.
> Would it be feasible to rescan all cards which have been dirtied (at least once) during the next stop-the-world phase?
> Maybe anybody has a better idea.
>  
> Kind regards,
> Martin
>  
> From: John Cuthbertson [mailto:john.cuthbertson at oracle.com] 
> Sent: Donnerstag, 23. Mai 2013 02:29
> To: Doerr, Martin
> Cc: hotspot-gc-dev at openjdk.java.net; Mikael Gerdin; Braun, Matthias
> Subject: Re: G1 question: concurrent cleaning of dirty cards
>  
> Hi Martin,
> 
> An enqueued card let's the refinment threads know that the oops spanned by that card need to be walked but we're only interested in the latest contents of the fields in those oops. IOW the oop in (3') doesn't need to be the oop stored in (1). If there's a subsequent store (3) to the same location then we want the load at (3') to see the lastest contents. For example suppose we have:
> 
> x.f = a;
> x.f = b;
> 
> If the application thread sees the card spanning x.f is dirty at the second store then we won't enqueue the card after the second store. As long as the refinement thread sees 'b' when the card is 'refined' then we're OK since we no longer need to add an entry into the RSet for the region containing a - we do need an entry in the RSet for the region containing b.
> 
> If the application thread sees the card as clean at the second store before the refinement thread loads x.f we have just needlessly enqueued the card again.
> 
> It is only if the application thread sees the card as dirty but the refinement thread reads 'a' then there could be a problem. We have a missing RSet entry for 'b'.
> 
> JohnC
> 
> On 5/17/2013 1:29 AM, Doerr, Martin wrote:
> Hi all,
>  
> we have a question about the interaction between G1 post barriers and the refinement thread's concurrent dirty card cleaning.
> The case in which the G1 post barrier sees a clean card is obviously not problematic, because it will add an entry in a dirty card queue.
> However, in case in which the Java thread (mutator thread) sees the card already dirtied, it won’t enqueue the card again. Which is safe as long as its stored oop (1) is seen and processed (3’) by the parallel refinement after having cleaned the card (1’):
>  
> Java Thread (mutator)              Refinement Thread (G1RemSet::concurrentRefineOneCard_impl calls oops_on_card_seq_iterate_careful)
>  
> (1)  store(oop)
> ( StoreLoad required here ?)
> (2)  load(card==dirty)
>  
>                                    (1’) store(card==clean)
>                                    (2’) StoreLoad barrier
>                                    (3’) load(oop)
>  
> So the refinement thread seems to rely on getting the oop which was written BEFORE the (2) load(card==dirty) was observed.
> We wonder how this ordering is guaranteed? There are no StoreLoad barriers in the Java Thread's path. (StoreLoad ordering needs explicit barriers even on TSO platforms.)
>  
> Kind regards,
> Martin
>  
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130628/a2a4484c/attachment.htm>