RFR: 8308507: G1: GClocker induced GCs can starve threads requiring memory leading to OOME [v2]
Thomas Schatzl
tschatzl at openjdk.org
Fri May 26 14:11:05 UTC 2023
On Tue, 23 May 2023 15:12:22 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote:
>> Please review this change which fixes the thread starvation problem during allocation for G1.
>>
>> The starvation problem is not limited to GCLocker, however, currently, it manifests as an OOME only when GCLocker is active. In other cases, the starvation only affects the "starved" thread as it may loop indefinitely.
>>
>> Starvation with an active GCLocker happens as below:
>>
>> 1. Thread A tries to allocate memory as normal, and tries to start a GC; the GCLocker is active and so the thread gets stalled waiting for the GC.
>> 2. GCLocker induced GC executes and frees some memory.
>> 3. Thread A does not get any of that memory, but other threads also waiting for memory.
>> 4. Goto 1 until the gclocker retry count has been reached.
>>
>> In this change, we take the general approach to solving starvation problems with announcement tables (request queues). On slow allocation, a thread that wishes to complete an Allocation GC and then attempt an allocation, announces its allocation request before proceeding to participate in a race to execute a GC safepoint. Whichever thread succeeds in executing the Allocation GC safepoint will be tasked with completing all allocation requests that were announced before the safepoint. This guarantees that all announced allocation requests are either satisfied during the safepoint, or failed in case there is not enough memory to complete all requests. This effectively deals with the starvation issue and reduces the number of allocation GCs triggered.
>>
>> Note: The change also adopts ZList from ZGC and makes it available under utilities as DoublyLinkedList with slight modifications.
>>
>> Testing: Tier 1-7
>
> Ivan Walulya has updated the pull request incrementally with two additional commits since the last revision:
>
> - Make explicit checks for unclaimed allocatiions
> - Thomas Review
Changes requested by tschatzl (Reviewer).
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 906:
> 904: bool G1CollectedHeap::upgrade_to_full_collection() {
> 905: GCCauseSetter compaction(this, GCCause::_g1_compaction_pause);
> 906: // Reset any allocated but unclaimed allocation requests.
This comment would be nice (at least in addition) in the declaration in the .hpp file.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 981:
> 979: }
> 980:
> 981: // Attempt to satisfy allocation requests failed; reset the requests, execute a full-gc,
Suggestion:
// Attempt to satisfy allocation requests failed; reset the requests, execute a full gc,
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 982:
> 980:
> 981: // Attempt to satisfy allocation requests failed; reset the requests, execute a full-gc,
> 982: // then try again
Suggestion:
// then try again.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 998:
> 996:
> 997: // Attempt to satisfy allocation requests after full-gc also failed. We reset the allocation requests
> 998: // then execute a maximal compaction full-gc before retrying the allocations
Suggestion:
// then execute a maximal compaction full-gc before retrying the allocations.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1018:
> 1016:
> 1017: for (StalledAllocReq* alloc_req; iter.next(&alloc_req);) {
> 1018: alloc_req->set_state(StalledAllocReq::AllocationState::Failed);
It does not make a difference at this point, but maybe it is somehow useful to keep "Succeeded" requests as such. This loop seems to unconditionally make all requests "Failed".
(I can't think of a situation where this would make a difference, but maybe some threads can handle null return values/OOME, while others don't).
src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 526:
> 524: HeapWord** result,
> 525: GCCause::Cause gc_cause);
> 526:
Indentation seems to be off here.
-------------
PR Review: https://git.openjdk.org/jdk/pull/14077#pullrequestreview-1446169635
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1206744268
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1206744894
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1206745156
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1206745569
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1206736221
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1206743379
More information about the hotspot-gc-dev
mailing list