Withdrawn: 8259643: ZGC can return metaspace OOM prematurely

Wed May 19 16:08:48 UTC 2021

On Thu, 28 Jan 2021 12:55:55 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
> 
> 1. full_gc()
> 2. final_allocation_attempt()
> 
> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
> 
> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
> 
> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
> 
> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2289