RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled
Aleksey Shipilev
shade at openjdk.org
Tue Mar 18 22:26:07 UTC 2025
On Tue, 18 Mar 2025 21:51:34 GMT, William Kemper <wkemper at openjdk.org> wrote:
> The sequence of events that creates this state:
> 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake
> 2. The regulator thread cancels old marking to start a young collection
> 3. A mutator thread shortly follows and attempts to cancel the nascent young collection
> 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure`
> 5. The mutator thread enters a tight loop in which it retries allocations without `waiting`
> 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`.
(too tired to do a full review, just mentioning a thing, so we look at it tomorrow)
src/hotspot/share/gc/shenandoah/shenandoahSharedVariables.hpp line 243:
> 241: assert (new_value < (sizeof(ShenandoahSharedValue) * CHAR_MAX), "sanity");
> 242: // Hmm, no platform template specialization defined for exchanging one byte... (up cast to intptr is workaround).
> 243: return (T)Atomic::xchg((intptr_t*)&value, (intptr_t)new_value);
That... likely gets awkward on different endianness. See the complicated dance `Atomic::CmpxchgByteUsingInt` has to do to handle it.
Not to mention we are likely writing to adjacent memory location. Which is _currently_ innocuous, since we hit padding, but it is not very reliable.
-------------
Changes requested by shade (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/24105#pullrequestreview-2696449916
PR Review Comment: https://git.openjdk.org/jdk/pull/24105#discussion_r2002110190
More information about the shenandoah-dev
mailing list