RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled
William Kemper
wkemper at openjdk.org
Tue Mar 18 23:01:08 UTC 2025
On Tue, 18 Mar 2025 22:23:23 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> The sequence of events that creates this state:
>> 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake
>> 2. The regulator thread cancels old marking to start a young collection
>> 3. A mutator thread shortly follows and attempts to cancel the nascent young collection
>> 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure`
>> 5. The mutator thread enters a tight loop in which it retries allocations without `waiting`
>> 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`.
>
> src/hotspot/share/gc/shenandoah/shenandoahSharedVariables.hpp line 243:
>
>> 241: assert (new_value < (sizeof(ShenandoahSharedValue) * CHAR_MAX), "sanity");
>> 242: // Hmm, no platform template specialization defined for exchanging one byte... (up cast to intptr is workaround).
>> 243: return (T)Atomic::xchg((intptr_t*)&value, (intptr_t)new_value);
>
> That... likely gets awkward on different endianness. See the complicated dance `Atomic::CmpxchgByteUsingInt` has to do to handle it.
>
> Not to mention we are likely writing to adjacent memory location. Which is _currently_ innocuous, since we hit padding, but it is not very reliable.
`PlatformCmpxchg` has specializations on aarch64 and x86 for `sizeof(T) == 1`. Should we also add platform specializations for `PlatformXchg` for `sizeof(T) == 1`? (It has them for `4` and `8`). Could also do what `XchgUsingCmpxchg` does...
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/24105#discussion_r2002137199
More information about the shenandoah-dev
mailing list