RFR: 8308507: G1: GClocker induced GCs can starve threads requiring memory leading to OOME [v3]
Thomas Schatzl
tschatzl at openjdk.org
Tue May 30 10:14:11 UTC 2023
On Tue, 30 May 2023 08:38:42 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote:
>> Please review this change which fixes the thread starvation problem during allocation for G1.
>>
>> The starvation problem is not limited to GCLocker, however, currently, it manifests as an OOME only when GCLocker is active. In other cases, the starvation only affects the "starved" thread as it may loop indefinitely.
>>
>> Starvation with an active GCLocker happens as below:
>>
>> 1. Thread A tries to allocate memory as normal, and tries to start a GC; the GCLocker is active and so the thread gets stalled waiting for the GC.
>> 2. GCLocker induced GC executes and frees some memory.
>> 3. Thread A does not get any of that memory, but other threads also waiting for memory.
>> 4. Goto 1 until the gclocker retry count has been reached.
>>
>> In this change, we take the general approach to solving starvation problems with announcement tables (request queues). On slow allocation, a thread that wishes to complete an Allocation GC and then attempt an allocation, announces its allocation request before proceeding to participate in a race to execute a GC safepoint. Whichever thread succeeds in executing the Allocation GC safepoint will be tasked with completing all allocation requests that were announced before the safepoint. This guarantees that all announced allocation requests are either satisfied during the safepoint, or failed in case there is not enough memory to complete all requests. This effectively deals with the starvation issue and reduces the number of allocation GCs triggered.
>>
>> Note: The change also adopts ZList from ZGC and makes it available under utilities as DoublyLinkedList with slight modifications.
>>
>> Testing: Tier 1-7
>
> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
>
> Thomas review
Lgtm.
Please fix the remaining additional minor nits before pushing. Some of the method descriptions need to be adapted (maybe add mention of the `node_index` param and such) too.
src/hotspot/share/gc/g1/g1Allocator.cpp line 75:
> 73: return has_mutator_alloc_region(node_index);
> 74: }
> 75:
Seems unused now and can be removed.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 446:
> 444: // point in trying to allocate further. We'll just return null.
> 445: log_debug(gc, alloc)("%s: Failed to allocate "
> 446: SIZE_FORMAT " words", Thread::current()->name(), word_size);
Suggestion:
log_debug(gc, alloc)("%s: Failed to allocate %zu words", Thread::current()->name(), word_size);
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1088:
> 1086: const size_t payload_size = words - CollectedHeap::filler_array_hdr_size();
> 1087: const size_t len = payload_size * HeapWordSize / sizeof(jint);
> 1088: assert((int)len >= 0, "size too large " SIZE_FORMAT " becomes %d", words, (int)len);
Suggestion:
assert((int)len >= 0, "size too large %zu becomes %d", words, (int)len);
Use `%zu` for any new code.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1113:
> 1111:
> 1112: size_t expand_bytes = MAX2(word_size * HeapWordSize, MinHeapDeltaBytes);
> 1113: log_debug(gc, ergo, heap)("Attempt heap expansion (allocation request failed). Allocation request: " SIZE_FORMAT "B",
Since this method is touched, maybe fix the `SIZE_FORMAT` here too.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1124:
> 1122: }
> 1123:
> 1124: HeapWord* G1CollectedHeap::expand_and_allocate(size_t word_size) {
I think this method is unused now. Removing it also seems to orphan `attempt_allocation_at_safepoint(size_t, bool)`.
-------------
Marked as reviewed by tschatzl (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/14077#pullrequestreview-1450520679
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1210048939
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1210007957
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1210005617
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1210007184
PR Review Comment: https://git.openjdk.org/jdk/pull/14077#discussion_r1210046698
More information about the hotspot-gc-dev
mailing list