Assertion failure on PPC64 after 8200545: Improve filter for enqueued deferred cards
Thomas Schatzl
thomas.schatzl at oracle.com
Thu May 23 12:26:58 UTC 2019
Hi,
On Thu, 2019-05-23 at 10:21 +0000, Doerr, Martin wrote:
> Hi Thomas,
>
> we observe sporadically failing assertion on PPC64:
> assert(region_attr.needs_remset_update() == hr_obj->rem_set()-
> >is_tracked()) failed: State flag indicating remset tracking
> disagrees (false) with actual remembered set (true) for region 62
> with region:
> > 62|0x00000000f3e00000, 0x00000000f3f00000,
> > 0x00000000f3f00000|100%| O| |TAMS 0x00000000f3f00000,
> > 0x00000000f3e00000| Complete
>
> (This pattern has shown up twice, just with a different heap region,
> once on linuxppc64le and once on AIX.)
Do you happen to have the other stacktrace and region information as
above too? The situation we are in "should not happen" :(, see below
why.
> Is the assertion too strict? I guess we could allow false positives
> of region_attr.needs_remset_update(), right?
We could allow errorneous "true" values (just adds another card to scan
by concurrent refinement), not wrong "false" ones though. These may
cause missing remembered sets.
> Or do you have an idea about a real problem like missing memory
> barriers?
Let me explain a bit why this particular situation is weird after (more
than) a few minutes looking over the code.
The needs_remembered_set_update region attribute table is set in one of
two locations:
- at the start of GC, before actual evacuation (in that stack trace we
are already multiple parallel phases beyond that), just duplicating the
values from HeapRegionRemSet::_state (in
G1CollectedHeap::register_regions_with_region_attr()).
We set the region_attr.needs_remembered_set_update() for all committed
regions at that time there.
I.e. impossible to get a wrong value due to that imho due to various
full barriers from that time to the crashing time.
- during copying, when allocating new regions.
The remembered set state (HeapRegionRemSet::_state) and the
corresponding region attribute table entry are set one after another
during allocation of a new region in
G1CollectedHeap::new_gc_alloc_region(), that is called by
G1AllocRegion::new_alloc_region_and_allocate(). So they should
correspond, although exactly the situation you describe may occur.
(Note that this means that any memory visibility issue when setting
these existed before 8200545, because previously G1 simply checked the
value of HeapRegionRemSet::_state.
The default value of needs_remembered_set_update of the region_attr
elements is "false" though; I just saw that there is an implicit type
coercion in the G1HeapRegionAttr constructor going on, but I assume
that "false" is "0" anyway. That is something that could be tried, i.e.
set the default to true and adapt the assert. That would mean we are
not initializing some region attr correctly. Or something is setting
wrong "false" values in there... probably not very useful...).
However, by default, newly allocated old gen regions never get assigned
a remembered set to update (in
G1RemSetTrackingPolicy::update_at_allocate() we should use the r-
>is_old() path, setting its new remembered set state to empty. Any
newly allocated region is a "Free" region.
"Free" region's remembered set state is "Empty" too (in
HeapRegion::hr_clear() we call HeapRegionRemSet::clear_locked() which
calls HeapRegionRemSet::set_state_empty() when freeing them).
So if this were a newly allocated region, it must have had an "Empty"
remembered set in the region information dump.
------------------
The situation is different for newly allocated survivor regions,
remembered sets get allocated as "Complete". However I believe that is
fine too because:
- if that object we are scanning (i.e. "p") is in some survivor region:
we do not call enqueue_card_if_tracked() from these regions at all as
the code in G1ScanEvacuatedObjClosure::do_oop_work() shows. In this
case the thread local _scanning_in_young == True.
- that object we are scanning (i.e. "p") is in some old region (must be
for the crashing case according to logs), i.e. we are scanning an
object recently promoted into some old region; if the "obj" reference
is in any kind of old region, the region attribute should have been
stable (which it apparently is not) because either
- that region was allocated during that gc. It's
HeapRegionRemSet::_state (and region attribute) must have been "Empty"
due to above.
- that region was allocated before gc: it may have any
HeapRegionRemSet::_state, but since we update that before the parallel
phases, it must be stable.
-------------------
So unfortunately I do not know right away how
region_attr.needs_remembered_set_update() could get inconsistent with
the corresponding HeapRegionRemSet::_state in this context.
At least it failed before missing some remembered set entry....
Can you file a bug and assign it to me? I will think about it some
more. Or maybe you can spot the problem(s) in my thinking?
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list