Assertion failure on PPC64 after 8200545: Improve filter for enqueued deferred cards
Doerr, Martin
martin.doerr at sap.com
Fri May 24 11:02:02 UTC 2019
Hi Thomas,
I've taken a 2nd look at the hs_err files.
Seems like the GCTaskThread which runs into the assertion has seen
hr_obj->rem_set()->is_tracked() == true
but another thread concurrently sets
r->rem_set()->set_state_complete()
When the hs_err file gets printed, the region shows up as "Complete" (which means untracked).
Does this make sense? In which scenario can this happen?
Best regards,
Martin
> -----Original Message-----
> From: hotspot-gc-dev <hotspot-gc-dev-bounces at openjdk.java.net> On Behalf
> Of Doerr, Martin
> Sent: Donnerstag, 23. Mai 2019 19:14
> To: Thomas Schatzl <thomas.schatzl at oracle.com>
> Cc: hotspot-gc-dev at openjdk.java.net
> Subject: [CAUTION] RE: Assertion failure on PPC64 after 8200545: Improve
> filter for enqueued deferred cards
>
> Hi Thomas,
>
> thanks for your explanations. I have created JDK-8224681. Feel free to edit it.
>
> Thanks,
> Martin
>
>
> > -----Original Message-----
> > From: Thomas Schatzl <thomas.schatzl at oracle.com>
> > Sent: Donnerstag, 23. Mai 2019 14:27
> > To: Doerr, Martin <martin.doerr at sap.com>
> > Cc: hotspot-gc-dev at openjdk.java.net; Reingruber, Richard
> > <richard.reingruber at sap.com>
> > Subject: Re: Assertion failure on PPC64 after 8200545: Improve filter for
> > enqueued deferred cards
> >
> > Hi,
> >
> > On Thu, 2019-05-23 at 10:21 +0000, Doerr, Martin wrote:
> > > Hi Thomas,
> > >
> > > we observe sporadically failing assertion on PPC64:
> > > assert(region_attr.needs_remset_update() == hr_obj->rem_set()-
> > > >is_tracked()) failed: State flag indicating remset tracking
> > > disagrees (false) with actual remembered set (true) for region 62
> > > with region:
> > > > 62|0x00000000f3e00000, 0x00000000f3f00000,
> > > > 0x00000000f3f00000|100%| O| |TAMS 0x00000000f3f00000,
> > > > 0x00000000f3e00000| Complete
> > >
> > > (This pattern has shown up twice, just with a different heap region,
> > > once on linuxppc64le and once on AIX.)
> >
> > Do you happen to have the other stacktrace and region information as
> > above too? The situation we are in "should not happen" :(, see below
> > why.
> >
> > > Is the assertion too strict? I guess we could allow false positives
> > > of region_attr.needs_remset_update(), right?
> >
> > We could allow errorneous "true" values (just adds another card to scan
> > by concurrent refinement), not wrong "false" ones though. These may
> > cause missing remembered sets.
> >
> > > Or do you have an idea about a real problem like missing memory
> > > barriers?
> >
> > Let me explain a bit why this particular situation is weird after (more
> > than) a few minutes looking over the code.
> >
> > The needs_remembered_set_update region attribute table is set in one of
> > two locations:
> >
> > - at the start of GC, before actual evacuation (in that stack trace we
> > are already multiple parallel phases beyond that), just duplicating the
> > values from HeapRegionRemSet::_state (in
> > G1CollectedHeap::register_regions_with_region_attr()).
> >
> > We set the region_attr.needs_remembered_set_update() for all committed
> > regions at that time there.
> >
> > I.e. impossible to get a wrong value due to that imho due to various
> > full barriers from that time to the crashing time.
> >
> > - during copying, when allocating new regions.
> >
> > The remembered set state (HeapRegionRemSet::_state) and the
> > corresponding region attribute table entry are set one after another
> > during allocation of a new region in
> > G1CollectedHeap::new_gc_alloc_region(), that is called by
> > G1AllocRegion::new_alloc_region_and_allocate(). So they should
> > correspond, although exactly the situation you describe may occur.
> >
> > (Note that this means that any memory visibility issue when setting
> > these existed before 8200545, because previously G1 simply checked the
> > value of HeapRegionRemSet::_state.
> >
> > The default value of needs_remembered_set_update of the region_attr
> > elements is "false" though; I just saw that there is an implicit type
> > coercion in the G1HeapRegionAttr constructor going on, but I assume
> > that "false" is "0" anyway. That is something that could be tried, i.e.
> > set the default to true and adapt the assert. That would mean we are
> > not initializing some region attr correctly. Or something is setting
> > wrong "false" values in there... probably not very useful...).
> >
> > However, by default, newly allocated old gen regions never get assigned
> > a remembered set to update (in
> > G1RemSetTrackingPolicy::update_at_allocate() we should use the r-
> > >is_old() path, setting its new remembered set state to empty. Any
> > newly allocated region is a "Free" region.
> > "Free" region's remembered set state is "Empty" too (in
> > HeapRegion::hr_clear() we call HeapRegionRemSet::clear_locked() which
> > calls HeapRegionRemSet::set_state_empty() when freeing them).
> >
> > So if this were a newly allocated region, it must have had an "Empty"
> > remembered set in the region information dump.
> >
> > ------------------
> >
> > The situation is different for newly allocated survivor regions,
> > remembered sets get allocated as "Complete". However I believe that is
> > fine too because:
> >
> > - if that object we are scanning (i.e. "p") is in some survivor region:
> > we do not call enqueue_card_if_tracked() from these regions at all as
> > the code in G1ScanEvacuatedObjClosure::do_oop_work() shows. In this
> > case the thread local _scanning_in_young == True.
> >
> > - that object we are scanning (i.e. "p") is in some old region (must be
> > for the crashing case according to logs), i.e. we are scanning an
> > object recently promoted into some old region; if the "obj" reference
> > is in any kind of old region, the region attribute should have been
> > stable (which it apparently is not) because either
> > - that region was allocated during that gc. It's
> > HeapRegionRemSet::_state (and region attribute) must have been "Empty"
> > due to above.
> > - that region was allocated before gc: it may have any
> > HeapRegionRemSet::_state, but since we update that before the parallel
> > phases, it must be stable.
> >
> > -------------------
> >
> > So unfortunately I do not know right away how
> > region_attr.needs_remembered_set_update() could get inconsistent with
> > the corresponding HeapRegionRemSet::_state in this context.
> >
> > At least it failed before missing some remembered set entry....
> >
> > Can you file a bug and assign it to me? I will think about it some
> > more. Or maybe you can spot the problem(s) in my thinking?
> >
> > Thanks,
> > Thomas
> >
More information about the hotspot-gc-dev
mailing list