<div dir="ltr"><div dir="ltr"><div>Hi all,</div><div><br></div><div>(not sure which mailing list is the best fit, I start with hs-gc. Please feel free to move it.)</div><div><br></div><div>JEP "Dynamic Max Memory Limit" has the aim to increase elasticity of java heap memory consumption. I wonder whether the same would make sense for metaspace? Granted, we  typically use way less memory for metaspace than for the heap, but there are quite a few corners where memory is wasted - mainly in situations where many classloaders come and go leaving metaspace chunks marooned in the VM.<br></div><div><br></div><div>In particular, the following two areas waste the most memory:</div><div>- metaspace memory in freelist (not owned by any loader)</div><div>- metaspace wasted where chunks in use by loaders do not allocate anymore, so the memory is pinned to the loader.</div><div><br></div><div>All this memory is wasted in the sense that though it could be reused in the future should new classes be loaded, this may never happen and the memory is still part of the VM process.</div><div><br></div><div>--<br></div><div><br></div><div>How is metaspace currently returned to the OS:</div><div><br></div><div>Memory for metaspace is allocated in 2MB sized mappings (VirtualSpaceNode) and kept in a chain. Chain grows if more memory is allocated. When a Loader requests metaspace, a chunk (Metachunk) is carved off the top VirtualSpaceNode and handed out. These Metachunks exist in various sizes between 1K and 64K in size.</div><div><br></div><div>When a classloader dies, it returns all its Metachunks to the metaspace allocator, which puts them into a freelist for possible reuse by a future class loader. Should all chunks in a VirtualSpaceNode become free, the VirtualSpaceNode itself is removed from its chain and unmapped.</div><div><br></div><div>This means memory is returned a bit arbitrarily: all chunks within a 2MB area must be freed, only then is the node unmapped. Whether or not this works highly depends on the fragmentation. A single classloader holding a 1K chunk in this node hostage will keep the whole 2MB node alive.</div><div><br></div><div>In addition, this does not work at all for the compressed class space. There, we do not have a chain of mappings but just one large mapping, which never gets unmapped. So, memory for the compressed class space is never returned to the OS.</div><div><br></div><div>-----</div><div><br></div><div>First idea: uncommit free meta chunks</div><div><br></div><div>Metachunks are returned to the freelist and there they do no good, so one could theoretically uncommit them as long as they are not needed, no? While keeping the address range still intact?</div><div><br></div><div>But the problem is that Metachunks are not guaranteed to span multiple pages, may often in fact be smaller than one page. Also the Metachunk header must not be compromised, so we cannot uncommit the first page of a metachunk since it contains its header. So, in reality we would only be able to uncommit the payload area of larger chunks (medium and humongous) which are 32K or larger.</div><div><br></div><div>Fortunately all this has been greatly simplified - more out of accident - by "8198423: Improve metaspace chunk allocation": There, we made it so that chunks which are returned to the freelist are automatically fused with neighboring chunks to form larger chunks. Also, with that change we introduced the rule that all chunks must be aligned to their size, so e.g. 4K chunks are 4K aligned etc.</div><div><br></div><div>This means that we have a natural tendency for free metachunks to form larger chunks, and that those are aligned nicely. That makes uncommitting their payload easy and rewarding.</div><div><br></div><div>Here is a patch which does just that. The patch is very minimal:</div><div><br></div><div><a href="http://cr.openjdk.java.net/~stuefe/webrevs/autouncommit-metachunks/webrev.00/webrev/index.html">http://cr.openjdk.java.net/~stuefe/webrevs/autouncommit-metachunks/webrev.00/webrev/index.html</a></div><div><br></div><div>To test whether this works, I wrote a small test which creates 1000 class loaders, each loading 10 classes, which uses up ~200M of metaspace. Then I started unloading them in a random fashion, until all are unloaded. The random unloading causes high fragmentation.</div><div><br></div><div>In the stock hotspot, we can see that the released memory is kept in the freelist, but almost no memory is given back to the OS until almost to the end:</div><div><br></div><div>Alive RSS(kb) freelist(kb)</div><div>1000 377780<span style="white-space:pre">  </span>28</div><div>900<span style="white-space:pre"> </span>378412<span style="white-space:pre">       </span>18428</div><div>800<span style="white-space:pre">      </span>375168<span style="white-space:pre">       </span>37012</div><div>700<span style="white-space:pre">      </span>375240<span style="white-space:pre">       </span>55412</div><div>600<span style="white-space:pre">      </span>375328<span style="white-space:pre">       </span>73996</div><div>500<span style="white-space:pre">      </span>375328<span style="white-space:pre">       </span>92028</div><div>400<span style="white-space:pre">      </span>375328<span style="white-space:pre">       </span>110428</div><div>300<span style="white-space:pre">     </span>372136<span style="white-space:pre">       </span>128758</div><div>200<span style="white-space:pre">     </span>372008<span style="white-space:pre">       </span>145110</div><div>100<span style="white-space:pre">     </span>357672<span style="white-space:pre">       </span>149357</div><div><br></div><div>That is not surprising, since the memory is highly fragmented and only at the last step a node was completely free and could be unmapped.</div><div><br></div><div>With my patch, one sees RSS dipping way more early:</div><div><br></div><div>Alive RSS(kb) freelist(kb)<br></div><div>1000 390464<span style="white-space:pre">   </span>18</div><div>900<span style="white-space:pre"> </span>380564<span style="white-space:pre">       </span>18418</div><div>800<span style="white-space:pre">      </span>366232<span style="white-space:pre">       </span>36818</div><div>700<span style="white-space:pre">      </span>351504<span style="white-space:pre">       </span>55218</div><div>600<span style="white-space:pre">      </span>326172<span style="white-space:pre">       </span>73618</div><div>500<span style="white-space:pre">      </span>310928<span style="white-space:pre">       </span>92570</div><div>400<span style="white-space:pre">      </span>296396<span style="white-space:pre">       </span>110418</div><div>300<span style="white-space:pre">     </span>280360<span style="white-space:pre">       </span>128748</div><div>200<span style="white-space:pre">     </span>264948<span style="white-space:pre">       </span>145110</div><div>100<span style="white-space:pre">     </span>245540<span style="white-space:pre">       </span>149357</div><div><br></div><div>The freelist content is identical, but it is now filled with chunks whose payload was uncommitted, therefore RSS starts going down. At the last step, with 100 loaded still alive, we have given about 100MB back to the system.</div><div><br></div><div>Of course this random scenario benefits most from my patch. Savings are smaller when classloaders are released in a lifo fashion, because metaspace is more clustered and the chance of Metachunks neighboring with chunks of the same loaders is higher.</div><div><br></div><div>(We may improve this patch by moving the headers out of the Metachunks alltogether, keeping chunk information separate from the payloads)</div><div><br></div><div>(I did not look closely at the cost of commiting/uncommiting. One may have to do this a bit smarter than I did in this patch to avoid expensive commit/uncommit cycles, e.g. always leave a certain number of free chunks committed.)<br></div><div><br></div><div>So, this may be a valid - more fluid and smooth - alternative way to give memory back to the OS than unmapping VirtualSpaceNode nodes.</div><div><br></div><div>-----</div><div><br></div><div>Thinking further: do we then even need the virtual space list?</div><div><br></div><div>IIUC the VirtualSpaceList exists for two reasons:</div><div><br></div><div>1) to make it possible to grow infinitely without having to deal an upper limit.</div><div>2) to make it possible to give freed memory back to the OS</div><div><br></div><div>(1) one could argue this is a goal we never really reached. Most of our customers actually specify MaxMetaspaceSize to limit the metaspace. More importantly, we have to specify CompressedClassSpaceSize in any case, and that limits metaspace growth even if MaxMetaspaceSize is not specified.</div><div>(2) would arguably be not needed anymore with my patch - especially if we moved the Metachunk headers somewhere else.<br></div><div><br></div><div>So, instead of the virtual space list we could allocate the non-class metaspace portion as one contiguous region upfront, same as the class space, and then commit them as needed. We only have to sacrifice the notion of limitless expansion.</div><div><br></div><div>Getting rid of VirtualSpaceList in favor of one large mapping would have the following advantages:</div><div><br></div><div>- Simplicity. The metaspace coding has gotten quite complex over time and every bit we retire is nice for maintenance.</div><div>- Fewer mappings: The virtual space list can get quite large and that shows up as a lot of memory mappings, at least on Linux. There is actually a limit to the number of mappings a process may have and we have hit this in the past with customers. These mappings also cause overhead in the linux kernel.</div><div>- Waste at the VirtualSpaceNode level. Not large by any means but it still counts.</div><div><br></div><div>----------</div><div><br></div><div>Thinking even further: Do we still need the class/non-class dichotomy? (This is more of an actual question, I am really unsure about this)</div><div><br></div><div>Lets say we get rid of the virtual space list and now have two large memory mappings side by side, the non-class and the class space. Why do we need two? </div><div><br></div><div>We could theoretically combine them to just one area, which would be just "the metaspace" and contain both class and non-class data.</div><div><br></div><div>This would have the following pros and cons:</div><div><br></div><div>+ Again, Simplicity. Getting rid of this dichotomy would really simplify the coding. Also easier to understand, explain to customers. Only one switch needed for sizing.</div><div>+ We would save quite a bit of wasted memory, especially with many small loaders which load many small chunks. Currently, each loader has to allocate at least two chunks, which effectively doubles the overhead.</div><div><br></div><div>But I see some cons too:</div><div><br></div><div>- For compressed class pointers to work, the total size of the class space must not exceed 3G. This limit would now apply to the combined size of class and non-class metadata. I do not know - do we ever exceed 3G total metaspace?</div><div>- Increasing the size may make it less probable to fit into the lower 32gb address space and use zero based addressing for the compressed Klass* pointers.</div><div><br></div><div>---</div><div><br></div><div>Thank you for your time. What do you think?</div><div><br></div><div>Kind Regards, Thomas</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div></div>