RFR: 8166607: G1 needs klass_or_null_acquire

Mon Oct 10 13:54:36 UTC 2016

On 2016-09-29, Kim Barrett wrote:
> Please review this change to G1 to use klass_or_null_acquire where
> needed.
> 
> [Reviewing on hotspot-dev; only GC code is involved, but some non-GC
> folks have expressed interest.]
> 
> G1 BOT uses of klass_or_null have all been changed to instead use
> klass_or_null_acquire.  This isn't necessary for the call in the
> revised version of oops_on_card_seq_iterate_careful (discussed below),
> but there can be (and are) other callers.  The performance impact of
> unnecessary barriers in oops_on_card_xxx will hopefully not be too bad
> on non-TSO systems; the number reduces to one as the BOT gets filled
> in.  Rather than splitting or otherwise conditionalizing, for now we
> just make it safe.  [The CollectedHeap block_start &etc API needs some
> TLC in the face of collectors that use klass_or_null therein, since
> the answer to a call to block_start could change instantaneously.  And
> some (all?) of the existing not-GC-specific callers probably should be
> using block_start_const rather than block_start.]
> 
> Changed oops_on_card_seq_iterate_careful to use different code paths
> when the region is humongous or not.
> 
> The humongous case is simplified because there can only be the one
> (potential) object in the region.  Use klass_or_null_acquire to
> determine whether the object has been "published" or allocation is
> still in-progress.  The acquire barrier ensures processing the
> humongous object is safe.  When the klass is NULL the card marking
> must be stale and can be ignored.
> 
> In the non-humongous case, we know we're dealing with an old-gen
> region.  Because of that, we know we don't need to deal with
> in-progress allocations here.  That allows many simplifications.
> 
> As a result of the changes in oops_on_card_seq_iterate_careful, we now
> almost never fail to process the card.  The only place where that can
> occur is a stale card in a humongous region with an in-progress
> allocation, where we can just ignore it.  So the only caller,
> refine_card, no longer needs to examine the result of the call and
> enqueue the card for later reconsideration.

I'm not sure this assumption is correct. Due to JDK-8166711 [0], can't
we have the following scenario:

1. The compiler (and/or CPU) reorders the loads for HeapRegion::_top and
   HeapRegion::_type in oops_on_card_seq_iterate_careful so that
   HeapRegion::_type is read before HeapRegion::_top.
2. There is a stale card pointing into a free region
3. The concurrent refinement thread begin executing
   oops_on_card_seq_iterate_careful fob the card in step two.
4. The first check is now (due to reordering), if (is_young()). This is
   false, since HeapRegion::_type is free.
5. A mutator thread now allocates a new object in the above free region,
   turning it into a young region. The mutator increases
   HeapRegion::_top, but has not yet finished writing the new object.
6. The concurrent refinement thread now performs the second check,
   if (mr.is_empty()). Since _top no longer equals bottom, and the stale
   card might point at the beginning of the region, this check might be
   false as well.
7. The humongous check will fail since this not a humongous region
8. The code below the humongous must now deal with a not yet "published"
   object in a young region, but isn't prepared to do that.

What do you think? This is very tricky code to analyze, but I think this
patch won't work in the above scenario.

When JDK-8166711 [0] is solved however, then I do think that this patch
will work. The important part for getting this patch to work is to ensure
that the two first checks, mr.is_empty() and is_young(), aren't
reordered.

Thanks,
Erik

[0]: https://bugs.openjdk.java.net/browse/JDK-8166811

> Note that some of the complexity in refine_card / oops_on_card_xxx
> arises from the possibility of "stale" marked cards, especially during
> concurrent refinement.  These are cards in regions that have been
> freed and then reallocated.  At present the only source of stale cards
> in the concurrent case seems to be HCC eviction.  (Eager reclaim of
> humongous objects flushes relevant remembered sets (possibly
> containing stale entries) for reconsideration during that GC, but
> in-GC card clipping to scan_top deals with that.)  Doing HCC cleanup
> when freeing regions might remove the need for klass_or_null checking
> in the humongous case for concurrent refinement, so might be worth
> looking into later.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8166607
> 
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8166607/webrev.01/
> 
> Testing:
> Nightly test equiv. via rbt. (in progress)
> Local specjbb2015 (non-perf).
> Local jtreg hotspot_all.
>