Proposal for improvements to the metaspace chunk allocator

Tue Feb 20 15:42:57 UTC 2018

Hi Goetz,

thank you for taking the time to review this change! As per your
suggestion, I created an RFE for this:

https://bugs.openjdk.java.net/browse/JDK-8198423

I'll use this one to track work on this patch, unless there are strong
objections in favour of going with the real JEP process.

Please find more comments inline.

On Mon, Feb 19, 2018 at 3:05 PM, Lindenmaier, Goetz <
goetz.lindenmaier at sap.com> wrote:

> Hi Thomas,
>
> thanks for posting this change. I think it will help a
> lot, especially with the class space.
>
> I agree that this not necessarily requires a JEP. So could
> you please open a bug and post a RFR to hotspot-runtime-dev?
>
> Thanks for the laborious documentation, the code is well
> to understand that way!
> Maybe put the text from this your mail into the bug?
> It's very helpful and easier to locate there than to find it in
> the mail archive.
>
> My comments:
>
> take_from_committed():
> Do I understand correctly that this only takes the next
> needed piece of memory? And because if the size passed to
> the current call is bigger than that of the last call,
> the alignment must be fixed you add what you call padding?
>

Yes.

>
> Is this also called for humongous chunks?
>

Yes.

> If not, for simplicity, I would have implemented this by just taking
> the next medium chunk (which would always be aligned) and
> split it into the needed size and add all the rest to
> the corresponding free lists.  But no change needed here,
> I just want to understand. (Probably this is not feasible
> because the humongous ones are not aliged to medium chunks size...)
>

You understand everything correctly.

As for your proposal, I am not sure it would make matters much simpler.
Maybe I do not fully understand:

Now, we do:
  - is watermark aligned to chunk size? No -> carve out padding chunks, add
them to freelist, then - with the watermark now properly aligned - carve
out the desired chunk we wanted in the first place.

After your proposal:
  - the watermark should always be correctly aligned. So, first, carve out
desired chunk. Then, if it is smaller than a medium chunk, carve out n
padding chunks until the watermark is properly aligned again.

Not sure this is better. Only the order of operations is reversed.

Also, yes, the one thorn is that Humongous chunks are still unaligned, but
we could change the alignment rules for humongous chunks - that would be
not difficult.

> I think the naming "padding chunks" is a bit misleading.
> It sounds as if the chunks would be wasted, but as they
> are added to the free lists they are not lost.
> dict.leo gives "offcut" for "Verschnitt" ... not a word
> common to me, but at least the german translation and the
> wordwise translation better fit the situation I think.
> Feel free to keep it as is, though.
>

I agree. "Alignment chunks"?

>
> In your mail you are discussing the additional fields you
> add. In case adding _is_class to metachunk is considered
> a problem (I don't think so), can't you compute the property
> "is_class()" by comparing the metachunk address with the
> possible range of the compressed class space? These 3GB are
> only reserved for the class space ...
>
>
Sure, that would be possible.

> TestVirtualSpaceNode_test() is empty. Maybe remove it altogether?
>
>
Makes sense.

> A lot of the methods are passed 'true' or 'false' to indicate
> whether it is for the class or metaspace manager. Maybe you
> could define enum is_class and is_metaspace or the like, to
> make these calls more speaking?
>
>
There is already one, "MetadataType". One could use that throughout the
code.

However, there already was a mixture of "MetadataType" and "bool is_class"
predating this patch - so, my patch did not add to the confusion, I just
choose one of the prevalent forms. Unifying those two forms makes sense and
can be done in a later cleanup (or? Opinions?).

> Minor nit: as you anyways normalize #defines to ASSERT, you
> might want to fix the remaining two or three #defines in metaspace.cpp
> from PRODUCT to ASSERT/DEBUG, too.
>
>
Sure!

> Best regards,
>   Goetz.
>
>
I'll wait a bit if more opinions are forthcoming; if not, I'll prepare a
new patch based on your suggestions.

Thanks again for the review work,

Best Regards, Thomas

> -----Original Message-----
> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf
> Of Thomas Stüfe
> Sent: Thursday, February 8, 2018 12:58 PM
> To: HotSpot Open Source Developers <hotspot-dev at openjdk.java.net>
> Subject: RFR: Proposal for improvements to the metaspace chunk allocator
>
> Hi,
>
> We would like to contribute a patch developed at SAP which has been live in
> our VM for some time. It improves the metaspace chunk allocation: reduces
> fragmentation and raises the chance of reusing free metaspace chunks.
>
> The patch: http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-
> coalescation/2018-02-05--2/webrev/
>
> In very short, this patch helps with a number of pathological cases where
> metaspace chunks are free but cannot be reused because they are of the
> wrong size. For example, the metaspace freelist could be full of small
> chunks, which would not be reusable if we need larger chunks. So, we could
> get metaspace OOMs even in situations where the metaspace was far from
> exhausted. Our patch adds the ability to split and merge metaspace chunks
> dynamically and thus remove the "size-lock-in" problem.
>
> Note that there have been other attempts to get a grip on this problem, see
> e.g. "SpaceManager::get_small_chunks_and_allocate()". But arguably our
> patch attempts a more complete solution.
>
> In 2016 I discussed the idea for this patch with some folks off-list, among
> them Jon Matsimutso. He then did advice me to create a JEP. So I did: [1].
> However, meanwhile changes to the JEP process were discussed [2], and I am
> not sure anymore this patch needs even needs a JEP. It may be moderately
> complex and hence carries the risk inherent in any patch, but its effects
> would not be externally visible (if you discount seeing fewer metaspace
> OOMs). So, I'd prefer to handle this as a simple RFE.
>
> --
>
> How this patch works:
>
> 1) When a class loader dies, its metaspace chunks are freed and returned to
> the freelist for reuse by the next class loader. With the patch, upon
> returning a chunk to the freelist, an attempt is made to merge it with its
> neighboring chunks - should they happen to be free too - to form a larger
> chunk. Which then is placed in the free list.
>
> As a result, the freelist should be populated by larger chunks at the
> expense of smaller chunks. In other words, all free chunks should always be
> as "coalesced as possible".
>
> 2) When a class loader needs a new chunk and a chunk of the requested size
> cannot be found in the free list, before carving out a new chunk from the
> virtual space, we first check if there is a larger chunk in the free list.
> If there is, that larger chunk is chopped up into n smaller chunks. One of
> them is returned to the callers, the others are re-added to the freelist.
>
> (1) and (2) together have the effect of removing the size-lock-in for
> chunks. If fragmentation allows it, small chunks are dynamically combined
> to form larger chunks, and larger chunks are split on demand.
>
> --
>
> What this patch does not:
>
> This is not a rewrite of the chunk allocator - most of the mechanisms stay
> intact. Specifically, chunk sizes remain unchanged, and so do chunk
> allocation processes (when do which class loaders get handed which chunk
> size). Almost everthing this patch does affects only internal workings of
> the ChunkManager.
>
> Also note that I refrained from doing any cleanups, since I wanted
> reviewers to be able to gauge this patch without filtering noise.
> Unfortunately this patch adds some complexity. But there are many future
> opportunities for code cleanup and simplification, some of which we already
> discussed in existing RFEs ([3], [4]). All of them are out of the scope for
> this particular patch.
>
> --
>
> Details:
>
> Before the patch, the following rules held:
> - All chunk sizes are multiples of the smallest chunk size ("specialized
> chunks")
> - All chunk sizes of larger chunks are also clean multiples of the next
> smaller chunk size (e.g. for class space, the ratio of
> specialized/small/medium chunks is 1:2:32)
> - All chunk start addresses are aligned to the smallest chunk size (more or
> less accidentally, see metaspace_reserve_alignment).
> The patch makes the last rule explicit and more strict:
> - All (non-humongous) chunk start addresses are now aligned to their own
> chunk size. So, e.g. medium chunks are allocated at addresses which are a
> multiple of medium chunk size. This rule is not extended to humongous
> chunks, whose start addresses continue to be aligned to the smallest chunk
> size.
>
> The reason for this new alignment rule is that it makes it cheap both to
> find chunk predecessors of a chunk and to check which chunks are free.
>
> When a class loader dies and its chunk is returned to the freelist, all we
> have is its address. In order to merge it with its neighbors to form a
> larger chunk, we need to find those neighbors, including those preceding
> the returned chunk. Prior to this patch that was not easy - one would have
> to iterate chunks starting at the beginning of the VirtualSpaceNode. But
> due to the new alignment rule, we now know where the prospective larger
> chunk must start - at the next lower larger-chunk-size-aligned boundary. We
> also know that currently a smaller chunk must start there (*).
>
> In order to check the free-ness of chunks quickly, each VirtualSpaceNode
> now keeps a bitmap which describes its occupancy. One bit in this bitmap
> corresponds to a range the size of the smallest chunk size and starting at
> an address aligned to the smallest chunk size. Because of the alignment
> rules above, such a range belongs to one single chunk. The bit is 1 if the
> associated chunk is in use by a class loader, 0 if it is free.
>
> When we have calculated the address range a prospective larger chunk would
> span, we now need to check if all chunks in that range are free. Only then
> we can merge them. We do that by querying the bitmap. Note that the most
> common use case here is forming medium chunks from smaller chunks. With the
> new alignment rules, the bitmap portion covering a medium chunk now always
> happens to be 16- or 32bit in size and is 16- or 32bit aligned, so reading
> the bitmap in many cases becomes a simple 16- or 32bit load.
>
> If the range is free, only then we need to iterate the chunks in that
> range: pull them from the freelist, combine them to one new larger chunk,
> re-add that one to the freelist.
>
> (*) Humongous chunks make this a bit more complicated. Since the new
> alignment rule does not extend to them, a humongous chunk could still
> straddle the lower or upper boundary of the prospective larger chunk. So I
> gave the occupancy map a second layer, which is used to mark the start of
> chunks.
> An alternative approach could have been to make humongous chunks size and
> start address always a multiple of the largest non-humongous chunk size
> (medium chunks). That would have caused a bit of waste per humongous chunk
> (<64K) in exchange for simpler coding and a simpler occupancy map.
>
> --
>
> The patch shows its best results in scenarios where a lot of smallish class
> loaders are alive simultaneously. When dying, they leave continuous
> expanses of metaspace covered in small chunks, which can be merged nicely.
> However, if class loader life times vary more, we have more interleaving of
> dead and alive small chunks, and hence chunk merging does not work as well
> as it could.
>
> For an example of a pathological case like this see example program: [5]
>
> Executed like this: "java -XX:CompressedClassSpaceSize=10M -cp test3
> test3.Example2" the test will load 3000 small classes in separate class
> loaders, then throw them away and start loading large classes. The small
> classes will have flooded the metaspace with small chunks, which are
> unusable for the large classes. When executing with the rather limited
> CompressedClassSpaceSize=10M, we will run into an OOM after loading about
> 800 large classes, having used only 40% of the class space, the rest is
> wasted to unused small chunks. However, with our patch the example program
> will manage to allocate ~2900 large classes before running into an OOM, and
> class space will show almost no waste.
>
> Do demonstrate this, add -Xlog:gc+metaspace+freelist. After running into an
> OOM, statistics and an ASCII representation of the class space will be
> shown. The unpatched version will show large expanses of unused small
> chunks, the patched variant will show almost no waste.
>
> Note that the patch could be made more effective with a different size
> ratio between small and medium chunks: in class space, that ratio is 1:16,
> so 16 small chunks must happen to be free to form one larger chunk. With a
> smaller ratio the chance for coalescation would be larger. So there may be
> room for future improvement here: Since we now can merge and split chunks
> on demand, we could introduce more chunk sizes. Potentially arriving at a
> buddy-ish allocator style where we drop hard-wired chunk sizes for a
> dynamic model where the ratio between chunk sizes is always 1:2 and we
> could in theory have no limit to the chunk size? But this is just a thought
> and well out of the scope of this patch.
>
> --
>
> What does this patch cost (memory):
>
>  - the occupancy bitmap adds 1 byte per 4K metaspace.
>  - MetaChunk headers get larger, since we add an enum and two bools to it.
> Depending on what the c++ compiler does with that, chunk headers grow by
> one or two MetaWords, reducing the payload size by that amount.
> - The new alignment rules mean we may need to create padding chunks to
> precede larger chunks. But since these padding chunks are added to the
> freelist, they should be used up before the need for new padding chunks
> arises. So, the maximally possible number of unused padding chunks should
> be limited by design to about 64K.
>
> The expectation is that the memory savings by this patch far outweighs its
> added memory costs.
>
> .. (performance):
>
> We did not see measurable drops in standard benchmarks raising over the
> normal noise. I also measured times for a program which stresses metaspace
> chunk coalescation, with the same result.
>
> I am open to suggestions what else I should measure, and/or independent
> measurements.
>
> --
>
> Other details:
>
> I removed SpaceManager::get_small_chunk_and_allocate() to reduce
> complexity
> somewhat, because it was made mostly obsolete by this patch: since small
> chunks are combined to larger chunks upon return to the freelist, in theory
> we should not have that many free small chunks anymore anyway. However,
> there may be still cases where we could benefit from this workaround, so I
> am asking your opinion on this one.
>
> About tests: There were two native tests - ChunkManagerReturnTest and
> TestVirtualSpaceNode (the former was added by me last year) - which did not
> make much sense anymore, since they relied heavily on internal behavior
> which was made unpredictable with this patch.
> To make up for these lost tests,  I added a new gtest which attempts to
> stress the many combinations of allocation pattern but does so from a layer
> above the old tests. It now uses Metaspace::allocate() and friends. By
> using that point as entry for tests, I am less dependent on implementation
> internals and still cover a lot of scenarios.
>
> --
>
> Review pointers:
>
> Good points to start are
> - ChunkManager::return_single_chunk() - specifically,
> ChunkManager::attempt_to_coalesce_around_chunk() - here we merge chunks
> upon return to the free list
> - ChunkManager::free_chunks_get(): Here we now split large chunks into
> smaller chunks on demand
> - VirtualSpaceNode::take_from_committed() : chunks are allocated according
> to align rules now, padding chunks are handles
> - The OccupancyMap class is the helper class implementing the new occupancy
> bitmap
>
> The rest is mostly chaff: helper functions, added tests and verifications.
>
> --
>
> Thanks and Best Regards, Thomas
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8166690
> [2] http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November
> /000128.html
> [3] https://bugs.openjdk.java.net/browse/JDK-8185034
> [4] https://bugs.openjdk.java.net/browse/JDK-8176808
> [5] https://bugs.openjdk.java.net/secure/attachment/63532/test3.zip
>