RFR: 8069367: assert(_nextMarkBitMap->isMarked((HeapWord*) obj)) failed

Mon Mar 9 12:24:15 UTC 2015

Hi Bengt,

On Mon, 2015-03-09 at 12:49 +0100, Bengt Rutisson wrote:
> Hi Kim,
> 
> On 2015-03-06 19:10, Kim Barrett wrote:
> > Please review this change to fix a problem in the interaction between
> > G1 concurrent marking and eager reclaim of humongous objects.
> >
> > I will need a sponsor for this change.
> >
> > The scenario we are dealing with is
> >[...]
> >
> > The additional test in concurrent marking imposes a small performance
> > degradation on concurrent marking.  Measurements of a program which
> > allocates a substantial number of objects and then does nothing but
> > repeatedly GC shows a fraction of a percent increase in concurrent
> > mark time, which is well within the variance for even this contrived
> > test.  Aurora performance comparison showed no significant negative
> > impact.  Alternatives that preclean the mark stack when humongous
> > objects are reclaimed get complicated when attempting to do so without
> > extending the reclaiming evacuation pause.
> 
> Thanks for providing such a detailed descriptions about the problem and 
> solution!
> 
> One question. I assume that this situation can only occur if the 
> humongous object was live before the marking started (otherwise it would 
> have already been filtered out since it would have TAMS == bottom) and 
> someone has removed the reference to the humongous object while we were 
> marking.
> 
> Here's an attempt to show what I mean in a diagram:
> 
> H = new Humongous(),;
> A.h = H;
> <G1 initial mark>
> <Marking scans A and pushes H on the mark stack>
> A.h = null;
> <G1 young GC>
> <H is reclaimed since no one references it>
> <Marking continues and finds H on the mark stack>
> 
> Is this what is happening? In that case, isn't this violating the SATB 
> invariant that anything that was live when marking started is considered 
> live when it ends?

Yes. That has already been a concern with the original eager reclaim.

> Your fix will make sure the marking doesn't crash, 
> but doesn't this behavior (even prior to your fix) cause other problems?

None that I know. The eager reclaim already made sure that there is no
other reference from a live object to the reclaimed object on the heap,
assuming the remembered sets were correct. So nobody else can
dereference the object.
There is the mentioned race where mark stacks had some references to
these objects left.

Ideally for this case, mark stacks were organized on a per region basis,
so you could just drop them during eager reclaim (or if any region
provably becomes known empty and unreferenced). That's what I think what
Kim refers to "being complicated" to do.

The conservative fix would be to disable eager reclaim during marking.
This has its own disadvantages: Applications where eager reclaim
matters, are often continuously marking. even if not, marking is only
done when space is already tight. So disabling eager reclaim during
marking seems quite counterproductive.

Heap verification should be okay too: while we walk through dead objects
on the heap (that may still contain references to that reclaimed
humongous objects), we do not check their references.

Thanks,
  Thomas