RFR: 8308507: G1: GClocker induced GCs can starve threads requiring memory leading to OOME [v4]
Albert Mingkun Yang
ayang at openjdk.org
Thu Jun 1 09:36:19 UTC 2023
On Wed, 31 May 2023 08:38:43 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote:
>> Please review this change which fixes the thread starvation problem during allocation for G1.
>>
>> The starvation problem is not limited to GCLocker, however, currently, it manifests as an OOME only when GCLocker is active. In other cases, the starvation only affects the "starved" thread as it may loop indefinitely.
>>
>> Starvation with an active GCLocker happens as below:
>>
>> 1. Thread A tries to allocate memory as normal, and tries to start a GC; the GCLocker is active and so the thread gets stalled waiting for the GC.
>> 2. GCLocker induced GC executes and frees some memory.
>> 3. Thread A does not get any of that memory, but other threads also waiting for memory.
>> 4. Goto 1 until the gclocker retry count has been reached.
>>
>> In this change, we take the general approach to solving starvation problems with announcement tables (request queues). On slow allocation, a thread that wishes to complete an Allocation GC and then attempt an allocation, announces its allocation request before proceeding to participate in a race to execute a GC safepoint. Whichever thread succeeds in executing the Allocation GC safepoint will be tasked with completing all allocation requests that were announced before the safepoint. This guarantees that all announced allocation requests are either satisfied during the safepoint, or failed in case there is not enough memory to complete all requests. This effectively deals with the starvation issue and reduces the number of allocation GCs triggered.
>>
>> Note: The change also adopts ZList from ZGC and makes it available under utilities as DoublyLinkedList with slight modifications.
>>
>> Testing: Tier 1-7
>
> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
>
> Thomas review
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 419:
> 417:
> 418: if (should_try_gc) {
> 419: bool succeeded = false;
Why is this var needed?
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 502:
> 500: }
> 501:
> 502: if (attempt_allocation_after_gc(word_size, gc_count_before, should_try_gc, &result, GCCause::_g1_inc_collection_pause)) {
When this returns `false`, it would loop. Why is it safe/correct to discard `GCLockerRetryAllocationCount` here? (Can't find it in G1 code.)
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1049:
> 1047: bool *expect_null_alloc_regions = (bool*)alloca(active_numa_nodes * sizeof(bool));
> 1048: for (uint i = 0; i < active_numa_nodes; i++) {
> 1049: expect_null_alloc_regions[i] = expect_null_mutator_alloc_region;
AFAICS, this array is just for assert eventually, right? If so, how useful is that assert? Can it be removed?
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1091:
> 1089: // Fill the allocated memory with filler objects.
> 1090: CollectedHeap::fill_with_objects(result, alloc_req->size());
> 1091: }
Maybe some words on why fillers are required? Not obvious at first glance.
src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 224:
> 222: Mutex _alloc_request_lock;
> 223: DoublyLinkedList<StalledAllocReq> _stalled_allocations;
> 224: DoublyLinkedList<StalledAllocReq> _satisfied_allocations;
Not super obvious to me why two lists are needed since each element has `state` already. Can they be merged into one? Then, one can avoid the list manipulation in the following example.
alloc_req->set_state(StalledAllocReq::AllocationState::Success, result);
...
// Move the allocation request from stalled to satisfied list.
_stalled_allocations.remove(alloc_req);
_satisfied_allocations.insert_last(alloc_req);
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1212394009
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1212399991
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1212400889
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1212401798
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1212398469
More information about the hotspot-gc-dev
mailing list