RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled

Aleksey Shipilev shade at openjdk.org
Tue Mar 18 22:26:07 UTC 2025


On Tue, 18 Mar 2025 21:51:34 GMT, William Kemper <wkemper at openjdk.org> wrote:

> The sequence of events that creates this state:
> 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake
> 2. The regulator thread cancels old marking to start a young collection
> 3. A mutator thread shortly follows and attempts to cancel the nascent young collection
> 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure`
> 5. The mutator thread enters a tight loop in which it retries allocations without `waiting`
> 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`.

(too tired to do a full review, just mentioning a thing, so we look at it tomorrow)

src/hotspot/share/gc/shenandoah/shenandoahSharedVariables.hpp line 243:

> 241:     assert (new_value < (sizeof(ShenandoahSharedValue) * CHAR_MAX), "sanity");
> 242:     // Hmm, no platform template specialization defined for exchanging one byte... (up cast to intptr is workaround).
> 243:     return (T)Atomic::xchg((intptr_t*)&value, (intptr_t)new_value);

That... likely gets awkward on different endianness. See the complicated dance `Atomic::CmpxchgByteUsingInt` has to do to handle it. 

Not to mention we are likely writing to adjacent memory location. Which is _currently_ innocuous, since we hit padding, but it is not very reliable.

-------------

Changes requested by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/24105#pullrequestreview-2696449916
PR Review Comment: https://git.openjdk.org/jdk/pull/24105#discussion_r2002110190


More information about the shenandoah-dev mailing list