RFR: 8166607: G1 needs klass_or_null_acquire

Mon Nov 7 19:38:25 UTC 2016

> On Nov 7, 2016, at 5:53 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> On Tue, 2016-10-25 at 19:11 -0400, Kim Barrett wrote:
>>> 
>>> On Oct 21, 2016, at 9:54 PM, Kim Barrett <kim.barrett at oracle.com>
>>> wrote:
>>> 
>>>> 
>>>> On Oct 21, 2016, at 8:46 PM, Kim Barrett <kim.barrett at oracle.com>
>>>> wrote:
>>>> In the humongous case, if it bails because klass_or_null == NULL,
>>>> we must re-enqueue
>>>> the card …
>> This update (webrev.02) reverts part of the previous change.
>> 
>> In the original RFR I said:
>> 
>>   As a result of the changes in oops_on_card_seq_iterate_careful, we
>>   now almost never fail to process the card.  The only place where
>>   that can occur is a stale card in a humongous region with an
>>   in-progress allocation, where we can just ignore it.  So the only
>>   caller, refine_card, no longer needs to examine the result of the
>>   call and enqueue the card for later reconsideration.
>> 
>> Ignoring such a stale card is incorrect at the point where it was
>> being done.  At that point we've already cleaned the card, so we must
>> either process the designated object(s) or, if we can't do the
>> processing because of in-progress allocation (klass_or_null returned
>> NULL), then re-queue the card for later reconsideration.
>> 
>> So the change to refine_card to eliminate that behavior, and the
>> associated changes to oops_on_card_seq_iterate_careful, were a
>> mistake, and are being reverted by this new version.  As a result,
>> refine_card is no longer changed at all.
> 
> Thanks for catching this.
> 
> Maybe it would be cleaner to call a method in the barrier set instead
> of inlining the dirtying + enqueuing in lines 685 to 691? Maybe as an
> additional RFE.

We could use _ct_bs->invalidate(dirtyRegion).  That's rather
overgeneralized and inefficient for this situation, but this situation
should occur *very* rarely; it requires a stale card get processed
just as a humongous object is in the midst of being allocated in the
same region.

>> Additionally, in the original RFR I also said:
>> 
>>   Note that [...] At present the only source of stale cards in the
>>   concurrent case seems to be HCC eviction.  [...]  Doing HCC cleanup
>>   when freeing regions might remove the need for klass_or_null
>>   checking in the humongous case for concurrent refinement, so might
>>   be worth looking into later.
>> 
>> That was also incorrect; there are other sources of stale cards.
> 
> Can you elaborate on that?

Here's a scenario that I've observed while running a jtreg test (I
think it was hotspot/test/gc/TestHumongousReferenceObject).

We have humongous object H, referring to young object Y.  This induces
a remembered set entry for card C in region R (allocated for H).

H becomes unreachable.
Start concurrent collection cycle.
Pause Initial Mark scan_rs pushes &H->Y onto mark stack.
Pause Initial Mark evac processes &H->Y, copying Y, updating &H->Y,
  and adding C to g1h_dcqs in update_rs.
Pause Initial Mark redirty_logged_cards dirties g1h_dcqs entries, including C.
Pause Initial Mark merges g1h_dcqs into java_dcqs, adding dirty C to java_dcqs.
Concurrent Mark determines H is dead.
Pause Cleanup frees regions for H, including R.
Concurrent Refinement finally comes across stale C in now (possibly) free R.

A similar situation can arise if instead of H we have old O in region
R and all objects in R are unreachable before starting concurrent
collection, so that Pause Cleanup frees R.