Assertion failure on PPC64 after 8200545: Improve filter for enqueued deferred cards

Doerr, Martin martin.doerr at sap.com
Thu May 23 17:13:46 UTC 2019


Hi Thomas,

thanks for your explanations. I have created JDK-8224681. Feel free to edit it.

Thanks,
Martin


> -----Original Message-----
> From: Thomas Schatzl <thomas.schatzl at oracle.com>
> Sent: Donnerstag, 23. Mai 2019 14:27
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: hotspot-gc-dev at openjdk.java.net; Reingruber, Richard
> <richard.reingruber at sap.com>
> Subject: Re: Assertion failure on PPC64 after 8200545: Improve filter for
> enqueued deferred cards
> 
> Hi,
> 
> On Thu, 2019-05-23 at 10:21 +0000, Doerr, Martin wrote:
> > Hi Thomas,
> >
> > we observe sporadically failing assertion on PPC64:
> > assert(region_attr.needs_remset_update() == hr_obj->rem_set()-
> > >is_tracked()) failed: State flag indicating remset tracking
> > disagrees (false) with actual remembered set (true) for region 62
> > with region:
> > >  62|0x00000000f3e00000, 0x00000000f3f00000,
> > > 0x00000000f3f00000|100%| O|  |TAMS 0x00000000f3f00000,
> > > 0x00000000f3e00000| Complete
> >
> > (This pattern has shown up twice, just with a different heap region,
> > once on linuxppc64le and once on AIX.)
> 
> Do you happen to have the other stacktrace and region information as
> above too? The situation we are in "should not happen" :(, see below
> why.
> 
> > Is the assertion too strict? I guess we could allow false positives
> > of region_attr.needs_remset_update(), right?
> 
> We could allow errorneous "true" values (just adds another card to scan
> by concurrent refinement), not wrong "false" ones though. These may
> cause missing remembered sets.
> 
> > Or do you have an idea about a real problem like missing memory
> > barriers?
> 
> Let me explain a bit why this particular situation is weird after (more
> than) a few minutes looking over the code.
> 
> The needs_remembered_set_update region attribute table is set in one of
> two locations:
> 
> - at the start of GC, before actual evacuation (in that stack trace we
> are already multiple parallel phases beyond that), just duplicating the
> values from HeapRegionRemSet::_state (in
> G1CollectedHeap::register_regions_with_region_attr()).
> 
> We set the region_attr.needs_remembered_set_update() for all committed
> regions at that time there.
> 
> I.e. impossible to get a wrong value due to that imho due to various
> full barriers from that time to the crashing time.
> 
> - during copying, when allocating new regions.
> 
> The remembered set state (HeapRegionRemSet::_state) and the
> corresponding region attribute table entry are set one after another
> during allocation of a new region in
> G1CollectedHeap::new_gc_alloc_region(), that is called by
> G1AllocRegion::new_alloc_region_and_allocate(). So they should
> correspond, although exactly the situation you describe may occur.
> 
> (Note that this means that any memory visibility issue when setting
> these existed before 8200545, because previously G1 simply checked the
> value of HeapRegionRemSet::_state.
> 
> The default value of needs_remembered_set_update of the region_attr
> elements is "false" though; I just saw that there is an implicit type
> coercion in the G1HeapRegionAttr constructor going on, but I assume
> that "false" is "0" anyway. That is something that could be tried, i.e.
> set the default to true and adapt the assert. That would mean we are
> not initializing some region attr correctly. Or something is setting
> wrong "false" values in there... probably not very useful...).
> 
> However, by default, newly allocated old gen regions never get assigned
> a remembered set to update (in
> G1RemSetTrackingPolicy::update_at_allocate() we should use the r-
> >is_old() path, setting its new remembered set state to empty. Any
> newly allocated region is a "Free" region.
> "Free" region's remembered set state is "Empty" too (in
> HeapRegion::hr_clear() we call HeapRegionRemSet::clear_locked() which
> calls HeapRegionRemSet::set_state_empty() when freeing them).
> 
> So if this were a newly allocated region, it must have had an "Empty"
> remembered set in the region information dump.
> 
> ------------------
> 
> The situation is different for newly allocated survivor regions,
> remembered sets get allocated as "Complete". However I believe that is
> fine too because:
> 
> - if that object we are scanning (i.e. "p") is in some survivor region:
> we do not call enqueue_card_if_tracked() from these regions at all as
> the code in G1ScanEvacuatedObjClosure::do_oop_work() shows. In this
> case the thread local _scanning_in_young == True.
> 
> - that object we are scanning (i.e. "p") is in some old region (must be
> for the crashing case according to logs), i.e. we are scanning an
> object recently promoted into some old region; if the "obj" reference
> is in any kind of old region, the region attribute should have been
> stable (which it apparently is not) because either
>   - that region was allocated during that gc. It's
> HeapRegionRemSet::_state (and region attribute) must have been "Empty"
> due to above.
>   - that region was allocated before gc: it may have any
> HeapRegionRemSet::_state, but since we update that before the parallel
> phases, it must be stable.
> 
> -------------------
> 
> So unfortunately I do not know right away how
> region_attr.needs_remembered_set_update() could get inconsistent with
> the corresponding HeapRegionRemSet::_state in this context.
> 
> At least it failed before missing some remembered set entry....
> 
> Can you file a bug and assign it to me? I will think about it some
> more. Or maybe you can spot the problem(s) in my thinking?
> 
> Thanks,
>   Thomas
> 



More information about the hotspot-gc-dev mailing list