RFR: 8277434: tests fail with "assert(is_forwarded()) failed: only decode when actually forwarded"
Kim Barrett
kbarrett at openjdk.java.net
Tue Nov 30 20:13:02 UTC 2021
On Mon, 29 Nov 2021 08:06:35 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:
>> Please review this change which fixes a rare assert failure in
>> G1ParScanThreadState::write_ref_field_post. The problem is that the assert is
>> referencing values from the forwardee, and that isn't safe. When an object is
>> copied and forwarded to the copy, the forwarding is performed with a relaxed
>> cmpxchg. This means the data being copied into the forwardee might not be
>> visible to a different thread that has obtained the forwardee. All uses of a
>> forwardee must take care to avoid reading from it without performing
>> additional synchronization. The assert in question violates that restriction.
>>
>> The incorrect assertion is
>>
>> assert(dest_attr.is_in_cset() == (obj->is_forwarded() && obj->forwardee() == obj), "...");
>>
>> obj is the forwardee of a different object.
>>
>> It's okay here to examine the contents of obj when it's in the cset. In
>> that case it's not a copy; it's the original object, self-forwarded due to
>> evacuation failure.
>>
>> But it's not okay to examine the contents of obj when it's not in the cset,
>> e.g. when it's a forwardee copy of the original object. Doing so violates
>> the above restriction.
>>
>> So that tells us this code is wrong, but it's still not obvious this is the
>> cause of the failure in question. The failure we're seeing appears to be
>> that obj->is_forwarded() returns true, but the assert of the same thing in
>> forwardee returns false. But is_forwarded() should never be true for a
>> copied forwardee. But it turns out a copied forwardee might temporarily
>> appear to be forwarded, due to the unordered copy + forward sequence.
>> Consider the following scenario:
>>
>> A thread is evacuating an object and allocates a candidate forwardee, copies
>> into it, and finally discovers the forwarding race was lost even before the
>> copy. The candidate forwardee appears to be marked as forwarded. That
>> candidate is deallocated and the real forwardee is used by the thread.
>>
>> That same thread evacuates a second object, reallocating some of the same
>> memory as from the previous candidate. It copies the second object into the
>> new forwardee, successfully performs the forwarding, and uses that newly
>> created forwardee.
>>
>> A second thread attempts to evacuate that same second object, and finds it
>> already forwarded. It goes into write_ref_field_post, reaches the assert,
>> and attempts to use values from the forwardee. Because the copy and the
>> forwarding by the first thread were unordered, this second thread might see
>> the old copy that appears to be forwarded, rather than the up to date copy.
>> So the first call to is_forwarded() returns true. The forwardee() assert of
>> the same thing might instead get the up to date contents, resulting in the
>> reported failure. Alternatively, it might still get old data, in which case
>> nothing gets noticed because the self-forward check will fail and the
>> write_ref_field_post assert will (accidentally) succeed.
>>
>> So there's a very small window in which the reported failure can occur. But
>> it's all caused by the write_ref_field_post assert performing invalid
>> accesses. Stop doing that and the problem should go away.
>>
>> I think this failure couldn't have happened before JDK-8271579, but that's
>> kind of accidental and doesn't change that the write_ref_field_post assert
>> was performing invalid accesses. (Those accesses were introduced later.)
>>
>> Testing:
>> mach5 tier1-5.
>>
>> Ran applications/jcstress/acqrel.java 100 times without failure.
>> Without the change I had a 25-30% failure rate for that test.
>> I don't know why that test was such a good canary, when this failure seems to
>> otherwise be pretty rare, but glad that it was.
>
> Thank you for the detailed analysis; this makes perfect sense.
Thanks for reviews @albertnetymk , @shipilev , and @tschatzl .
-------------
PR: https://git.openjdk.java.net/jdk/pull/6582
More information about the hotspot-gc-dev
mailing list