RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v4]
Xiaolong Peng
xpeng at openjdk.org
Wed Mar 12 17:27:55 UTC 2025
On Wed, 12 Mar 2025 16:40:40 GMT, William Kemper <wkemper at openjdk.org> wrote:
>> src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 171:
>>
>>> 169:
>>> 170: // If this cycle completed without being cancelled, notify waiters about it
>>> 171: if (!heap->cancelled_gc()) {
>>
>> I feel we should remove the test `!heap->cancelled_gc()` here, if is fine if there is single mutator thread, but in most cases there are mutator threads, then the following case could happen:
>> 1. **Mutator A** try to cancel GC and notify control thread, it will wait with `_alloc_failure_waiters_lock`, `_cancelled_cause` is set to `_allocation_failure`
>> 2. Concurrent GC clear `_cancelled_cause` and set it to `_no_gc` in op_final_update_refs
>> 3. **Mutator B** try to cancel GC and successfully set `_cancelled_cause` to `_allocation_failure` again.
>> 4. Concurrent GC finishes.
>> 5. Control thread check `!heap->cancelled_gc()` which is false, and won't wake up mutators.
>>
>> In this case, it will delay the wake up for mutator A & B to next cycle.
>
> I believe that is the correct behavior. The mutators are waiting until there is memory available. If mutator B cannot allocate, there is no reason to believe mutator A would be able to allocate. In this case, it is fine for both mutators to wait (even if it means A has to wait an extra cycle).
Thanks for the explanation, re-read the relevant codes I think it make sense, when Mutator B fails to allocate when Concurrent GC is at `op_final_update_refs`, very unlikely there is enough space for Mutator A.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23997#discussion_r1991968117
More information about the shenandoah-dev
mailing list