Thoughts about improving the metaspace allocator

Sat Feb 2 20:19:11 UTC 2019

Small correction, my first sentence refers to "JEP 346: Promptly Return
Unused Committed Memory from G1", not "JEP Dynamic Max Memory Limit", but I
hope the rest of my mail still made sense :)

Cheers, Thomas

On Fri, Feb 1, 2019 at 1:53 PM Thomas Stüfe <thomas.stuefe at gmail.com> wrote:

> Hi all,
>
> (not sure which mailing list is the best fit, I start with hs-gc. Please
> feel free to move it.)
>
> JEP "Dynamic Max Memory Limit" has the aim to increase elasticity of java
> heap memory consumption. I wonder whether the same would make sense for
> metaspace? Granted, we  typically use way less memory for metaspace than
> for the heap, but there are quite a few corners where memory is wasted -
> mainly in situations where many classloaders come and go leaving metaspace
> chunks marooned in the VM.
>
> In particular, the following two areas waste the most memory:
> - metaspace memory in freelist (not owned by any loader)
> - metaspace wasted where chunks in use by loaders do not allocate anymore,
> so the memory is pinned to the loader.
>
> All this memory is wasted in the sense that though it could be reused in
> the future should new classes be loaded, this may never happen and the
> memory is still part of the VM process.
>
> --
>
> How is metaspace currently returned to the OS:
>
> Memory for metaspace is allocated in 2MB sized mappings (VirtualSpaceNode)
> and kept in a chain. Chain grows if more memory is allocated. When a Loader
> requests metaspace, a chunk (Metachunk) is carved off the top
> VirtualSpaceNode and handed out. These Metachunks exist in various sizes
> between 1K and 64K in size.
>
> When a classloader dies, it returns all its Metachunks to the metaspace
> allocator, which puts them into a freelist for possible reuse by a future
> class loader. Should all chunks in a VirtualSpaceNode become free, the
> VirtualSpaceNode itself is removed from its chain and unmapped.
>
> This means memory is returned a bit arbitrarily: all chunks within a 2MB
> area must be freed, only then is the node unmapped. Whether or not this
> works highly depends on the fragmentation. A single classloader holding a
> 1K chunk in this node hostage will keep the whole 2MB node alive.
>
> In addition, this does not work at all for the compressed class space.
> There, we do not have a chain of mappings but just one large mapping, which
> never gets unmapped. So, memory for the compressed class space is never
> returned to the OS.
>
> -----
>
> First idea: uncommit free meta chunks
>
> Metachunks are returned to the freelist and there they do no good, so one
> could theoretically uncommit them as long as they are not needed, no? While
> keeping the address range still intact?
>
> But the problem is that Metachunks are not guaranteed to span multiple
> pages, may often in fact be smaller than one page. Also the Metachunk
> header must not be compromised, so we cannot uncommit the first page of a
> metachunk since it contains its header. So, in reality we would only be
> able to uncommit the payload area of larger chunks (medium and humongous)
> which are 32K or larger.
>
> Fortunately all this has been greatly simplified - more out of accident -
> by "8198423: Improve metaspace chunk allocation": There, we made it so that
> chunks which are returned to the freelist are automatically fused with
> neighboring chunks to form larger chunks. Also, with that change we
> introduced the rule that all chunks must be aligned to their size, so e.g.
> 4K chunks are 4K aligned etc.
>
> This means that we have a natural tendency for free metachunks to form
> larger chunks, and that those are aligned nicely. That makes uncommitting
> their payload easy and rewarding.
>
> Here is a patch which does just that. The patch is very minimal:
>
>
> http://cr.openjdk.java.net/~stuefe/webrevs/autouncommit-metachunks/webrev.00/webrev/index.html
>
> To test whether this works, I wrote a small test which creates 1000 class
> loaders, each loading 10 classes, which uses up ~200M of metaspace. Then I
> started unloading them in a random fashion, until all are unloaded. The
> random unloading causes high fragmentation.
>
> In the stock hotspot, we can see that the released memory is kept in the
> freelist, but almost no memory is given back to the OS until almost to the
> end:
>
> Alive RSS(kb) freelist(kb)
> 1000 377780 28
> 900 378412 18428
> 800 375168 37012
> 700 375240 55412
> 600 375328 73996
> 500 375328 92028
> 400 375328 110428
> 300 372136 128758
> 200 372008 145110
> 100 357672 149357
>
> That is not surprising, since the memory is highly fragmented and only at
> the last step a node was completely free and could be unmapped.
>
> With my patch, one sees RSS dipping way more early:
>
> Alive RSS(kb) freelist(kb)
> 1000 390464 18
> 900 380564 18418
> 800 366232 36818
> 700 351504 55218
> 600 326172 73618
> 500 310928 92570
> 400 296396 110418
> 300 280360 128748
> 200 264948 145110
> 100 245540 149357
>
> The freelist content is identical, but it is now filled with chunks whose
> payload was uncommitted, therefore RSS starts going down. At the last step,
> with 100 loaded still alive, we have given about 100MB back to the system.
>
> Of course this random scenario benefits most from my patch. Savings are
> smaller when classloaders are released in a lifo fashion, because metaspace
> is more clustered and the chance of Metachunks neighboring with chunks of
> the same loaders is higher.
>
> (We may improve this patch by moving the headers out of the Metachunks
> alltogether, keeping chunk information separate from the payloads)
>
> (I did not look closely at the cost of commiting/uncommiting. One may have
> to do this a bit smarter than I did in this patch to avoid expensive
> commit/uncommit cycles, e.g. always leave a certain number of free chunks
> committed.)
>
> So, this may be a valid - more fluid and smooth - alternative way to give
> memory back to the OS than unmapping VirtualSpaceNode nodes.
>
> -----
>
> Thinking further: do we then even need the virtual space list?
>
> IIUC the VirtualSpaceList exists for two reasons:
>
> 1) to make it possible to grow infinitely without having to deal an upper
> limit.
> 2) to make it possible to give freed memory back to the OS
>
> (1) one could argue this is a goal we never really reached. Most of our
> customers actually specify MaxMetaspaceSize to limit the metaspace. More
> importantly, we have to specify CompressedClassSpaceSize in any case, and
> that limits metaspace growth even if MaxMetaspaceSize is not specified.
> (2) would arguably be not needed anymore with my patch - especially if we
> moved the Metachunk headers somewhere else.
>
> So, instead of the virtual space list we could allocate the non-class
> metaspace portion as one contiguous region upfront, same as the class
> space, and then commit them as needed. We only have to sacrifice the notion
> of limitless expansion.
>
> Getting rid of VirtualSpaceList in favor of one large mapping would have
> the following advantages:
>
> - Simplicity. The metaspace coding has gotten quite complex over time and
> every bit we retire is nice for maintenance.
> - Fewer mappings: The virtual space list can get quite large and that
> shows up as a lot of memory mappings, at least on Linux. There is actually
> a limit to the number of mappings a process may have and we have hit this
> in the past with customers. These mappings also cause overhead in the linux
> kernel.
> - Waste at the VirtualSpaceNode level. Not large by any means but it still
> counts.
>
> ----------
>
> Thinking even further: Do we still need the class/non-class dichotomy?
> (This is more of an actual question, I am really unsure about this)
>
> Lets say we get rid of the virtual space list and now have two large
> memory mappings side by side, the non-class and the class space. Why do we
> need two?
>
> We could theoretically combine them to just one area, which would be just
> "the metaspace" and contain both class and non-class data.
>
> This would have the following pros and cons:
>
> + Again, Simplicity. Getting rid of this dichotomy would really simplify
> the coding. Also easier to understand, explain to customers. Only one
> switch needed for sizing.
> + We would save quite a bit of wasted memory, especially with many small
> loaders which load many small chunks. Currently, each loader has to
> allocate at least two chunks, which effectively doubles the overhead.
>
> But I see some cons too:
>
> - For compressed class pointers to work, the total size of the class space
> must not exceed 3G. This limit would now apply to the combined size of
> class and non-class metadata. I do not know - do we ever exceed 3G total
> metaspace?
> - Increasing the size may make it less probable to fit into the lower 32gb
> address space and use zero based addressing for the compressed Klass*
> pointers.
>
> ---
>
> Thank you for your time. What do you think?
>
> Kind Regards, Thomas
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20190202/308a0d36/attachment.htm>