RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v4]

William Kemper wkemper at openjdk.org
Wed Mar 12 16:42:59 UTC 2025


On Wed, 12 Mar 2025 00:24:46 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   The non-generational modes may also fail to notify waiters
>
> src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 171:
> 
>> 169: 
>> 170:       // If this cycle completed without being cancelled, notify waiters about it
>> 171:       if (!heap->cancelled_gc()) {
> 
> I feel we should remove the test `!heap->cancelled_gc()` here, if is fine if there is single mutator thread, but in most cases there are mutator threads, then the following case could happen:
> 1. **Mutator A** try to cancel GC and notify control thread, it will wait with `_alloc_failure_waiters_lock`, `_cancelled_cause` is set to `_allocation_failure` 
> 2. Concurrent GC clear `_cancelled_cause` and set it to `_no_gc` in op_final_update_refs 
> 3.  **Mutator B** try to cancel GC and successfully set  `_cancelled_cause` to `_allocation_failure` again.
> 4. Concurrent GC finishes.
> 5. Control thread check `!heap->cancelled_gc()` which is false, and won't wake up mutators.
> 
> In this case, it will delay the wake up for mutator A & B to next cycle.

I believe that is the correct behavior. The mutators are waiting until there is memory available. If mutator B cannot allocate, there is no reason to believe mutator A would be able to allocate. In this case, it is fine for both mutators to wait (even if it means A has to wait an extra cycle).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23997#discussion_r1991893985


More information about the shenandoah-dev mailing list