RFR: 8326936: RISC-V: Shenandoah GC crashes due to incorrect atomic memory operations
Robbin Ehn
rehn at openjdk.org
Wed Feb 28 18:09:52 UTC 2024
On Wed, 28 Feb 2024 09:55:37 GMT, MaxXing <duke at openjdk.org> wrote:
> With compressed oops enabled, Shenandoah GC crashes in the concurrent marking phase due to some incorrect atomic memory operations. This resulted in the failure of multiple related tests, including `gc/shenandoah/TestSmallHeap.java`, `gc/metaspace/TestMetaspacePerfCounters.java#Shenandoah-64`, and so on, tested on XuanTie C910 and QEMU.
>
> This failure is related to a GCC bug we recently discovered: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114130.
>
> In detail, there's a pattern in method `ShenandoahHeap::conc_update_with_forwarded`:
>
>
> if (in_collection_set(obj)) {
> // Dereferences `obj` without explicit null check.
> oop fwd = ShenandoahBarrierSet::resolve_forwarded_not_null(obj);
> // Then calls atomic built-in.
> atomic_update_oop(fwd, p, obj);
> }
>
>
> `atomic_update_oop` then compresses `obj` into a 32-bit word and calls the built-in `__atomic_compare_exchange`. The latter produces incorrect assembly that comparing this unsigned 32-bit word with the sign-extended result of `lr.w` instructions.
>
> This bug can be bypassed by adding an explicit null check (like `if (obj && in_collection_set(obj))`), or adding compiler flag `-fno-delete-null-pointer-checks`.
>
> In previous commits, JDK-8316186 removed RISC-V's `PlatformCmpxchg<4>` but left the flag `-fno-delete-null-pointer-checks` enabled. Then JDK-8316893 removed the flag and made the bug visible. This patch adds `PlatformCmpxchg<4>` back.
Hi, on vacation, back next week.
In this case old_value is unsigned, but in the assembly we loaded a sign extended value.
Return value thus seems always singed extended, as we plainly return old_value.
As the compiler can't know it should zero extend an 'uint32_t' on return.
So this does not look correct to me?
Anyhow can't we just use a int32_t when comparing, and cast that back to the 'uint32_t' on return? (and still use __atomic_...)
Sorry not really available, as I said back next week.
Thanks for finding!
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18039#issuecomment-1969555152
More information about the hotspot-runtime-dev
mailing list