RFR: 8259643: ZGC can return metaspace OOM prematurely [v3]

Erik Österlund eosterlund at openjdk.java.net
Mon Nov 15 15:31:06 UTC 2021


> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
> 
> 1. full_gc()
> 2. final_allocation_attempt()
> 
> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
> 
> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
> 
> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
> 
> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).

Erik Österlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:

 - Merge branch 'master' into 8259643_load_unload_bug
 - polish code alignment and rename register/unregister to add/remove
 - 8259643: ZGC can return metaspace OOM prematurely

-------------

Changes: https://git.openjdk.java.net/jdk/pull/2289/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=02
  Stats: 298 lines in 6 files changed: 276 ins; 17 del; 5 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2289.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2289/head:pull/2289

PR: https://git.openjdk.java.net/jdk/pull/2289


More information about the hotspot-dev mailing list