RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2]

William Kemper wkemper at openjdk.org
Fri Dec 5 18:53:37 UTC 2025


On Fri, 5 Dec 2025 18:50:08 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress.
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Set requested gc cause under a lock when allocation fails

src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 145:

> 143:   // Notifies the control thread, but does not update the requested cause or generation.
> 144:   // The overloaded variant should be used when the _control_lock is already held.
> 145:   void notify_cancellation(GCCause::Cause cause);

These methods were the root cause here. `ShenandoahHeap::_canceled_gc` is read/written atomically, but `ShenandoahGenerationalControlThread::_requested_gc_cause` is read/written under a lock. These `notify_cancellation` methods did _not_ update `_requested_gc_cause` at all. So, in the failure I observed we had:
1. Control thread finishes cycle and sees no cancellation is requested (no lock used).
2. Mutator thread fails allocation, cancels GC (again, no lock used), and does _not_ change `_requested_gc_cause`.
3. Control thread takes `_control_lock` and checks `_requested_gc_cause` and sees  `_no_gc`  (because `notify_cancellation` didn't change it) and `waits` forever now.


The fix here is to replace `notify_cancellation` with `notify_control_thread` which serializes updates to `_requested_gc_cause` under  `_control_lock`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28665#discussion_r2593632599


More information about the shenandoah-dev mailing list