RFR: 8277434: tests fail with "assert(is_forwarded()) failed: only decode when actually forwarded"
Kim Barrett
kbarrett at openjdk.java.net
Sun Nov 28 20:10:22 UTC 2021
Please review this change which fixes a rare assert failure in
G1ParScanThreadState::write_ref_field_post. The problem is that the assert is
referencing values from the forwardee, and that isn't safe. When an object is
copied and forwarded to the copy, the forwarding is performed with a relaxed
cmpxchg. This means the data being copied into the forwardee might not be
visible to a different thread that has obtained the forwardee. All uses of a
forwardee must take care to avoid reading from it without performing
additional synchronization. The assert in question violates that restriction.
The incorrect assertion is
assert(dest_attr.is_in_cset() == (obj->is_forwarded() && obj->forwardee() == obj), "...");
obj is the forwardee of a different object.
It's okay here to examine the contents of obj when it's in the cset. In
that case it's not a copy; it's the original object, self-forwarded due to
evacuation failure.
But it's not okay to examine the contents of obj when it's not in the cset,
e.g. when it's a forwardee copy of the original object. Doing so violates
the above restriction.
So that tells us this code is wrong, but it's still not obvious this is the
cause of the failure in question. The failure we're seeing appears to be
that obj->is_forwarded() returns true, but the assert of the same thing in
forwardee returns false. But is_forwarded() should never be true for a
copied forwardee. But it turns out a copied forwardee might temporarily
appear to be forwarded, due to the unordered copy + forward sequence.
Consider the following scenario:
A thread is evacuating an object and allocates a candidate forwardee, copies
into it, and finally discovers the forwarding race was lost even before the
copy. The candidate forwardee appears to be marked as forwarded. That
candidate is deallocated and the real forwardee is used by the thread.
That same thread evacuates a second object, reallocating some of the same
memory as from the previous candidate. It copies the second object into the
new forwardee, successfully performs the forwarding, and uses that newly
created forwardee.
A second thread attempts to evacuate that same second object, and finds it
already forwarded. It goes into write_ref_field_post, reaches the assert,
and attempts to use values from the forwardee. Because the copy and the
forwarding by the first thread were unordered, this second thread might see
the old copy that appears to be forwarded, rather than the up to date copy.
So the first call to is_forwarded() returns true. The forwardee() assert of
the same thing might instead get the up to date contents, resulting in the
reported failure. Alternatively, it might still get old data, in which case
nothing gets noticed because the self-forward check will fail and the
write_ref_field_post assert will (accidentally) succeed.
So there's a very small window in which the reported failure can occur. But
it's all caused by the write_ref_field_post assert performing invalid
accesses. Stop doing that and the problem should go away.
I think this failure couldn't have happened before JDK-8271579, but that's
kind of accidental and doesn't change that the write_ref_field_post assert
was performing invalid accesses. (Those accesses were introduced later.)
Testing:
mach5 tier1-5.
Ran applications/jcstress/acqrel.java 100 times without failure.
Without the change I had a 25-30% failure rate for that test.
I don't know why that test was such a good canary, when this failure seems to
otherwise be pretty rare, but glad that it was.
-------------
Commit messages:
- remove test from problemlist
- avoid assert based on possibly bad data
Changes: https://git.openjdk.java.net/jdk/pull/6582/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6582&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8277434
Stats: 10 lines in 3 files changed: 6 ins; 4 del; 0 mod
Patch: https://git.openjdk.java.net/jdk/pull/6582.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/6582/head:pull/6582
PR: https://git.openjdk.java.net/jdk/pull/6582
More information about the hotspot-gc-dev
mailing list