RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2]
William Kemper
wkemper at openjdk.org
Fri Dec 5 18:53:37 UTC 2025
On Fri, 5 Dec 2025 18:50:08 GMT, William Kemper <wkemper at openjdk.org> wrote:
>> In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress.
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
>
> Set requested gc cause under a lock when allocation fails
src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 145:
> 143: // Notifies the control thread, but does not update the requested cause or generation.
> 144: // The overloaded variant should be used when the _control_lock is already held.
> 145: void notify_cancellation(GCCause::Cause cause);
These methods were the root cause here. `ShenandoahHeap::_canceled_gc` is read/written atomically, but `ShenandoahGenerationalControlThread::_requested_gc_cause` is read/written under a lock. These `notify_cancellation` methods did _not_ update `_requested_gc_cause` at all. So, in the failure I observed we had:
1. Control thread finishes cycle and sees no cancellation is requested (no lock used).
2. Mutator thread fails allocation, cancels GC (again, no lock used), and does _not_ change `_requested_gc_cause`.
3. Control thread takes `_control_lock` and checks `_requested_gc_cause` and sees `_no_gc` (because `notify_cancellation` didn't change it) and `waits` forever now.
The fix here is to replace `notify_cancellation` with `notify_control_thread` which serializes updates to `_requested_gc_cause` under `_control_lock`.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28665#discussion_r2593632599
More information about the shenandoah-dev
mailing list