RFC: JEP: Elastic Metaspace

Wed Nov 27 17:05:50 UTC 2019

Hi all,

As some of you know, I work on a prototype for a new Metaspace. Now I reached a point where the prototype is done, works well, is stable. Results are promising and I would like to get feedback on how best to proceed.

The JEP is still in draft state ([1]). In my mind it is the spiritual counterpart to JEPs like JEP 346: "Promptly Return Unused Committed Memory from G1" or JEP 351: "ZGC: Uncommit Unused Memory".

The new Metaspace is a wholesale replacement of the old one and has the following advantages:

- It is way more elastic. In situations involving mass class unloading, we see a significant reduction in committed memory. For an extreme example, see [2] which demonstrates how the new implementation recovers from usage spikes compared to the old one. Here we see a reduction of about 70% of Metaspace after class unloading.
- There are modest memory savings even without class unloading. With many applications, we see a reduction in Metaspace committed space of about 5-10 %.
- (I believe) the new implementation is cleaner and long term cheaper to maintain. It does away with a lot of peculiarities of the old implementation - which had grown organically for a while now. Its sub parts are cleanly separated, and can be changed, tested and even replaced individually.

If you'd like to take a look and give the prototype a spin, it lives in the jdk-sandbox repository, under the branch "stuefe-new-metaspace-branch" [3].

---
A quick run through what changed with the new Metaspace, what stayed the same:

- We still use mmap(). We still have two spaces, the non-class-space and the class space. The same basic layout - a chained list of memory regions for the non-class metaspace, a contiguous region for the Compressed Class space.

(We could of course question this setup. For example, we could get rid of the non-class-space region chain, and let everything live in a pre-reserved contiguous range - basically the former class space, but now containing all metadata. This would have some technical benefits at the cost of loosing the potential of unlimited, "zero maintenance" growth. But in conversations with Oracle I found that this was not desired, and it is not really that important.)

- So we reserve memory like we did before, but do not commit it with the typical HWM scheme; instead, the memory is divided into homogenous sections of n pages, and each section ("commit granule") can be committed/uncommitted individually.

- Atop of that model we still have chunks like we did before, but these chunks can be committed, uncommitted or partly committed. When memory is allocated from a chunk, the underlying commit granules are committed automatically. That makes it possible to hand large chunks to class loaders and still not pay the full price up front.

- Chunk sizes follow a power-2-buddy-allocator [4] scheme: they are sized from very small (1K) up to large (4M) in power-2-steps. On allocation, larger chunks are split to produce the desired chunk size; on deallocation, chunks are fused with neighboring buddies to form larger chunks. We also do not have humongous chunks anymore since they are unnecessary.
In principle, we have a form of weird, crooked buddy allocator even in the current Metaspace, since [5], when we introduced chunk coalescation with JDK 11. However, due to the odd chunk geometry the current allocator has, and due to things like humongous chunks, the current implementation is inefficient, costs more, and is way more complicated than necessary. The beauty about buddy style allocation is that it is dead simple and cheap to implement, and that everyone knows it - so this makes maintenance easier.

- When the loader is collected, the chunks are released into freelists; they are fused in buddy-style-fashion, forming larger chunks. If chunk size surpasses a (tunable) threshold, memory below that chunk is uncommitted.

Please see the JEP description [1] for more details.

--

This is ongoing work, and not every improvement is listed in the JEP since it is supposed to be a high-level view. I am currently at the "tweaking" phase, tuning and building small additions to make the Metaspace allocator perform smarter in corner cases.

One example would be the treatment of "Micro-ClassLoaderData" - CLDs which only load one class, e.g. Reflection delegator classes or hidden classes for lambdas. These CLDs will only ever allocate one InstanceKlass, and in these cases it is inefficient to use the full SpaceManager-Chunk-Machinery for the class space part. That can be done much simpler and would save about 10% of committed Metaspace in Lambda-heavy cases.

--

But I am not sure how to proceed now. So I would love to get feedback on this. My plans are to get this into JDK 15 if possible. Long term, I also would love to backport this to older releases - since it is a pretty isolated piece of machinery with only loose ties to the rest of the VM, that should be possible without too much problems.

I am also not sure if JEP is the right vehicle. I would not mind a JEP number for this, but my priority is to bring this in. If a JEP makes sense, I would be happy if someone were to sponsor this.
Thank you,

Thomas

[1] https://bugs.openjdk.java.net/browse/JDK-8221173
[2] https://bugs.openjdk.java.net/secure/attachment/85771/test-results.pdf
[3] http://hg.openjdk.java.net/jdk/sandbox/shortlog/54750b448264
[4] https://en.wikipedia.org/wiki/Buddy_memory_allocation[5] https://bugs.openjdk.java.net/browse/JDK-8198423