RFR(M): 8195103: Refactor ReduceInitialCardMarks to not assume all GCs use card marks

Tue Feb 6 10:55:03 UTC 2018

Hi,

On Tue, 2018-02-06 at 10:05 +0000, Erik Osterlund wrote:
> Hi Kim,
> 
> On 6 Feb 2018, at 05:15, Kim Barrett <kim.barrett at oracle.com> wrote:
> 
> > > 
> > > > > [...]
> > > > > The optimization needs to check if an object is in young or
> > > > > not. This question is now asked to the barrier set rather
> > > > > than the heap.
> > > > > For all collectors except G1, this has been implemented by
> > > > > forwarding the question to the corresponding heap (inlined
> > > > > member function), which is  what was done before. For G1, I
> > > > > chose to instead look at the card value  and see if it is a
> > > > > young card, which should give the same answer.
> > > > 
> > > > Marking the cards young is done concurrently to the
> > > > application. So you could get false answers here. However it
> > > > seems that this is benign, i.e. at most too many objects are
> > > > pushed into the deferred card mark from what I can see.
> > > > 
> > > > However the assert in
> > > > CardTableModRefBs::flush_deferred_card_mark_barrier() may
> > > > complain...
> > > > i.e. at the time when the object is deferred, the result of
> > > > is_young() may be false, but at the time the deferred card mark
> > > > is flushed, is_young() will return true.
> > > > 
> > > > Note that while this occurrence is not very common, it does
> > > > happen.
> > > > 
> > > > I think this needs to be fixed. Either the mentioned assert, or
> > > > the is_young() check. The region type is always good btw.
> > > 
> > > We discussed this off-list. There is in fact no such race.
> > > The compiler slow-path first allocates new memory (TLAB or not).
> > > Then it writes young to all of the cards. Then it contemplates
> > > whether performing a card mark is necessary for non-young objects
> > > to comply with ReduceInitialCardMarks.
> > > So by the time the is_young() question is asked,
> > > Thread::current() has written the young value, which is always
> > > observable to itself. It might be that a concurrent thread over-
> > > writes this value with a monotonic card transition to the very
> > > same young value, due to crossing the same card boundary with
> > > another allocation. In either case, the young value will always
> > > be observed by the thread that performed the allocation if and
> > > only if the object then resides in young.
> > 
> > This threw me a bit too.
> > 
> > I thought there was a post-pause fill cards for young regions
> > phase. It's alluded to in a comment in
> > G1RemSet::refine_card_concurrently:
> > "The region could be young. ..." But I can't find any such code
> > now.
> > Maybe I'm misremembering?  Or maybe it got refactored out of
> > existance?
> 
> I had a vague memory about that too but after inspecting the code
> concluded this is currently not the case. I think we used to shade
> the cards young on region level after dropping a mutex but changed it
> at some time as it was a bad idea.

Just to add to the confusion :) G1 afair never shaded a young region's
cards during a mutex, but always after dropping it (in
G1CollectedHeap::attempt_allocation(), which is outside holding any
lock).

The situation for concurrent refinement is simply different: during
concurrent refinement there is a thread *different* to the one which
sets the young card marks, and that one can observe the non-young cards
in the young region.

This is not the case here, the same thread accesses the young region's
card to determine whether it is in young. Obviously it always observes
its own writes.

Thanks,
  Thomas