A couple of questions for G1 developers

Wed Jun 12 11:55:49 UTC 2019

Hi Andrew,

On Wed, 2019-06-12 at 10:24 +0100, Andrew Haley wrote:
> On 6/3/19 2:52 PM, Thomas Schatzl wrote:
> > The racing components are on the one hand the allocating mutator
> > thread, and on the other hand refinement. While the card table
> > contains a special "is-young" value for newly allocated regions,
> > this setting the "is-young" value is racy with the mutators. So you
> > might end up with refinement looking at not fully initialized
> > contents of the klass value, where a NULL klass serves as indicator
> > for that situation (and refinement should give up refining that
> > card for now).
> 
> So there's something I still don't get. (Please have patience with
> me.)  When an object is allocated (in Eden space, say) its memory is
> not zeroed until after it has been allocated whereupon becomes
> visible to a collector. What am I misubderstanding?
> 

Java heap memory zeroing and visibility works as usual. I was referring
to corresponding card table memory. The fully initialized Java object
may be visible at a different time than the corresponding card table
memory.

To decrease the cost of the post write barrier, mainly the StoreLoad,
it first checks whether that card which we want to dirty is a "young"
card, if so, exit the write barrier as G1 never needs to refine such a
card (or carry it in the remembered set).

So when allocating a new eden region, G1 sets the values of the
corresponding cards to "young" outside of a global lock iirc. So
setting these cards to "young" results in the usual visibility races.
I.e. it is possible to get cards in young regions during refinement (in
G1RemSet::refine_card_concurrently()). Looking at the code again right
now currently there are actually other checks that will filter these
cards into young gen before actually iterating over the objects.

There is however at least (maybe only?) one other case where you can
get (valid) NULL klass pointers during refinement: humongous object
allocation.

Example:
- mutator changes some cards in a humongous object X, enqueuing some
cards  a,b,c to refine
- remark pause reclaims X (G1 does not filter a,b,c)
- mutator allocates humongous object Y at the same spot
- while that is ongoing, refinement threads look at cards a,b,c

(See
HeapRegion::oops_on_card_seq_iterate_careful/do_oops_on_card_in_humongo
us() and the comment in
G1CollectedHeap::humongous_obj_allocate_initialize_regions() at the
OrderAccess::storestore() call site).

I doubt this relates to your case, as the crashes you experience are
within the STW pause processing; also you did not mention humongous
objects :) Concurrent refinement might have thrashed the BOT before
that GC though; in this case the reason could be multiple refinement
threads doing HeapRegion::block_start() in
HeapRegion::oops_on_card_seq_iterate_careful().

Hth,
  Thomas