RFR: 8277434: tests fail with "assert(is_forwarded()) failed: only decode when actually forwarded" [v2]

Kim Barrett kbarrett at openjdk.java.net
Tue Nov 30 20:26:44 UTC 2021


> Please review this change which fixes a rare assert failure in
> G1ParScanThreadState::write_ref_field_post. The problem is that the assert is
> referencing values from the forwardee, and that isn't safe. When an object is
> copied and forwarded to the copy, the forwarding is performed with a relaxed
> cmpxchg. This means the data being copied into the forwardee might not be
> visible to a different thread that has obtained the forwardee. All uses of a
> forwardee must take care to avoid reading from it without performing
> additional synchronization. The assert in question violates that restriction.
> 
> The incorrect assertion is
> 
> assert(dest_attr.is_in_cset() == (obj->is_forwarded() && obj->forwardee() == obj), "...");
> 
> obj is the forwardee of a different object.
> 
> It's okay here to examine the contents of obj when it's in the cset.  In
> that case it's not a copy; it's the original object, self-forwarded due to
> evacuation failure.
> 
> But it's not okay to examine the contents of obj when it's not in the cset,
> e.g. when it's a forwardee copy of the original object.  Doing so violates
> the above restriction.
> 
> So that tells us this code is wrong, but it's still not obvious this is the
> cause of the failure in question.  The failure we're seeing appears to be
> that obj->is_forwarded() returns true, but the assert of the same thing in
> forwardee returns false.  But is_forwarded() should never be true for a
> copied forwardee.  But it turns out a copied forwardee might temporarily
> appear to be forwarded, due to the unordered copy + forward sequence.
> Consider the following scenario:
> 
> A thread is evacuating an object and allocates a candidate forwardee, copies
> into it, and finally discovers the forwarding race was lost even before the
> copy.  The candidate forwardee appears to be marked as forwarded.  That
> candidate is deallocated and the real forwardee is used by the thread.
> 
> That same thread evacuates a second object, reallocating some of the same
> memory as from the previous candidate.  It copies the second object into the
> new forwardee, successfully performs the forwarding, and uses that newly
> created forwardee.
> 
> A second thread attempts to evacuate that same second object, and finds it
> already forwarded.  It goes into write_ref_field_post, reaches the assert,
> and attempts to use values from the forwardee.  Because the copy and the
> forwarding by the first thread were unordered, this second thread might see
> the old copy that appears to be forwarded, rather than the up to date copy.
> So the first call to is_forwarded() returns true.  The forwardee() assert of
> the same thing might instead get the up to date contents, resulting in the
> reported failure.  Alternatively, it might still get old data, in which case
> nothing gets noticed because the self-forward check will fail and the
> write_ref_field_post assert will (accidentally) succeed.
> 
> So there's a very small window in which the reported failure can occur.  But
> it's all caused by the write_ref_field_post assert performing invalid
> accesses.  Stop doing that and the problem should go away.
> 
> I think this failure couldn't have happened before JDK-8271579, but that's
> kind of accidental and doesn't change that the write_ref_field_post assert
> was performing invalid accesses.  (Those accesses were introduced later.)
> 
> Testing:
> mach5 tier1-5.
> 
> Ran applications/jcstress/acqrel.java 100 times without failure.
> Without the change I had a 25-30% failure rate for that test.
> I don't know why that test was such a good canary, when this failure seems to
> otherwise be pretty rare, but glad that it was.

Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Merge branch 'master' into fix_forwarding_assert
 - remove test from problemlist
 - avoid assert based on possibly bad data

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6582/files
  - new: https://git.openjdk.java.net/jdk/pull/6582/files/25daf289..4b353c4f

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6582&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6582&range=00-01

  Stats: 1461 lines in 73 files changed: 606 ins; 581 del; 274 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6582.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6582/head:pull/6582

PR: https://git.openjdk.java.net/jdk/pull/6582



More information about the hotspot-gc-dev mailing list