RFR: 8259643: ZGC can return metaspace OOM prematurely [v3]
Erik Österlund
eosterlund at openjdk.java.net
Mon Nov 15 15:47:51 UTC 2021
On Mon, 15 Nov 2021 15:31:06 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:
>> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
>>
>> 1. full_gc()
>> 2. final_allocation_attempt()
>>
>> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
>>
>> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
>>
>> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
>>
>> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).
>
> Erik Österlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
>
> - Merge branch 'master' into 8259643_load_unload_bug
> - polish code alignment and rename register/unregister to add/remove
> - 8259643: ZGC can return metaspace OOM prematurely
Sorry I ran out of steam with this patch a few months ago. Looks like I already had 3 reviews so I think I am ready to go. I rebased with the latest mainline, which involved just a small fix to what kind of lock (not safepoint checking lock with new rank) is used due to all the lock ranking changes as of lately.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2289
More information about the hotspot-dev
mailing list