RFR(L): 8198423: Improve metaspace chunk allocation (was: Proposal for improvements to the metaspace chunk allocator)

Thu Mar 8 17:18:07 UTC 2018

Hey Eric, welcome to the party :)

We were about to push the change (Coleen is running some final tests) but I
am happy to wait a bit more, the more eyes the better.

Further comments inline:

On Thu, Mar 8, 2018 at 5:14 PM, Erik Helin <erik.helin at oracle.com> wrote:

> Alright, I know I am bit late to the party, but I have now refreshed my
> knowledge of Metaspace enough to begin reviewing this. I will probably ask
> a few questions to get a better understanding of your changes :)
>
> First of all, great work! The code is easy to read and the concepts are
> clear, so thank you for that.
>
>
Thanks!

Still, code has gotten quite bloaty, and my change is not helping. I have
some patches in the work to reduce complexity.

An initial commend:
>
> 1592     // Chunks are born as in-use (see MetaChunk ctor). So, before
> returning
> 1593     // the padding chunk to its chunk manager, mark it as in use
> (ChunkManager
> 1594     // will assert that).
> 1595     do_update_in_use_info_for_chunk(padding_chunk, true);
>
> This comment is slightly hard to read. I _think_ what you are meaning is
> something like:
>
>    // So, before returning the padding chunk to its chunk manger,
>    // mark it as in use in the the occupancy map.
>
> Is that correct?

Correct. Yes, the comment is a bit convoluted. I meant: the chunk needs to
appear to ChunkManager::return_chunk as a valid in-use chunks, because that
it expects.

> Besides updating the occupancy map, do_update_in_use_info_for_chunk will
> also call MetaChunk::set_is_tagged_free, but that is not strictly needed
> here, right?
>
>
Yes, this is correct.

> Thanks,
> Erik
>
> On 02/28/2018 05:17 PM, Thomas Stüfe wrote:
>
>> Hi Eric, no problem!
>>
>> Thanks, Thomas
>>
>> On Wed, Feb 28, 2018 at 4:28 PM, Erik Helin <erik.helin at oracle.com
>> <mailto:erik.helin at oracle.com>> wrote:
>>
>>     Hi Thomas,
>>
>>     I will take a look at this, I just have been a bit busy lately
>>     (sorry for not responding earlier).
>>
>>     Thanks,
>>     Erik
>>
>>
>>     On 02/26/2018 03:20 PM, Thomas Stüfe wrote:
>>
>>         Hi all,
>>
>>         I know this patch is a bit larger, but may I please have reviews
>>         and/or
>>         other input?
>>
>>         Issue: https://bugs.openjdk.java.net/browse/JDK-8198423
>>         <https://bugs.openjdk.java.net/browse/JDK-8198423>
>>         Latest version:
>>         http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc
>> ation/2018-02-26/webrev/
>>         <http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coales
>> cation/2018-02-26/webrev/>
>>
>>         For those who followed the mail thread, this is the incremental
>>         diff to the
>>         last changes (included feedback Goetz gave me on- and off-list):
>>         http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc
>> ation/2018-02-26/webrev-incr/webrev/
>>         <http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coales
>> cation/2018-02-26/webrev-incr/webrev/>
>>
>>         Thank you!
>>
>>         Kind Regards, Thomas Stuefe
>>
>>
>>
>>         On Thu, Feb 8, 2018 at 12:58 PM, Thomas Stüfe
>>         <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>>
>>
>>         wrote:
>>
>>             Hi,
>>
>>             We would like to contribute a patch developed at SAP which
>>             has been live
>>             in our VM for some time. It improves the metaspace chunk
>>             allocation:
>>             reduces fragmentation and raises the chance of reusing free
>>             metaspace
>>             chunks.
>>
>>             The patch:
>>             http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc
>>             <http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc
>> >
>>             ation/2018-02-05--2/webrev/
>>
>>             In very short, this patch helps with a number of
>>             pathological cases where
>>             metaspace chunks are free but cannot be reused because they
>>             are of the
>>             wrong size. For example, the metaspace freelist could be
>>             full of small
>>             chunks, which would not be reusable if we need larger
>>             chunks. So, we could
>>             get metaspace OOMs even in situations where the metaspace
>>             was far from
>>             exhausted. Our patch adds the ability to split and merge
>>             metaspace chunks
>>             dynamically and thus remove the "size-lock-in" problem.
>>
>>             Note that there have been other attempts to get a grip on
>>             this problem,
>>             see e.g. "SpaceManager::get_small_chunks_and_allocate()".
>>             But arguably
>>             our patch attempts a more complete solution.
>>
>>             In 2016 I discussed the idea for this patch with some folks
>>             off-list,
>>             among them Jon Matsimutso. He then did advice me to create a
>>             JEP. So I did:
>>             [1]. However, meanwhile changes to the JEP process were
>>             discussed [2], and
>>             I am not sure anymore this patch needs even needs a JEP. It
>>             may be
>>             moderately complex and hence carries the risk inherent in
>>             any patch, but
>>             its effects would not be externally visible (if you discount
>>             seeing fewer
>>             metaspace OOMs). So, I'd prefer to handle this as a simple
>> RFE.
>>
>>             --
>>
>>             How this patch works:
>>
>>             1) When a class loader dies, its metaspace chunks are freed
>>             and returned
>>             to the freelist for reuse by the next class loader. With the
>>             patch, upon
>>             returning a chunk to the freelist, an attempt is made to
>>             merge it with its
>>             neighboring chunks - should they happen to be free too - to
>>             form a larger
>>             chunk. Which then is placed in the free list.
>>
>>             As a result, the freelist should be populated by larger
>>             chunks at the
>>             expense of smaller chunks. In other words, all free chunks
>>             should always be
>>             as "coalesced as possible".
>>
>>             2) When a class loader needs a new chunk and a chunk of the
>>             requested size
>>             cannot be found in the free list, before carving out a new
>>             chunk from the
>>             virtual space, we first check if there is a larger chunk in
>>             the free list.
>>             If there is, that larger chunk is chopped up into n smaller
>>             chunks. One of
>>             them is returned to the callers, the others are re-added to
>>             the freelist.
>>
>>             (1) and (2) together have the effect of removing the
>>             size-lock-in for
>>             chunks. If fragmentation allows it, small chunks are
>>             dynamically combined
>>             to form larger chunks, and larger chunks are split on demand.
>>
>>             --
>>
>>             What this patch does not:
>>
>>             This is not a rewrite of the chunk allocator - most of the
>>             mechanisms stay
>>             intact. Specifically, chunk sizes remain unchanged, and so
>>             do chunk
>>             allocation processes (when do which class loaders get handed
>>             which chunk
>>             size). Almost everthing this patch does affects only
>>             internal workings of
>>             the ChunkManager.
>>
>>             Also note that I refrained from doing any cleanups, since I
>>             wanted
>>             reviewers to be able to gauge this patch without filtering
>>             noise.
>>             Unfortunately this patch adds some complexity. But there are
>>             many future
>>             opportunities for code cleanup and simplification, some of
>>             which we already
>>             discussed in existing RFEs ([3], [4]). All of them are out
>>             of the scope for
>>             this particular patch.
>>
>>             --
>>
>>             Details:
>>
>>             Before the patch, the following rules held:
>>             - All chunk sizes are multiples of the smallest chunk size
>>             ("specialized
>>             chunks")
>>             - All chunk sizes of larger chunks are also clean multiples
>>             of the next
>>             smaller chunk size (e.g. for class space, the ratio of
>>             specialized/small/medium chunks is 1:2:32)
>>             - All chunk start addresses are aligned to the smallest
>>             chunk size (more
>>             or less accidentally, see metaspace_reserve_alignment).
>>             The patch makes the last rule explicit and more strict:
>>             - All (non-humongous) chunk start addresses are now aligned
>>             to their own
>>             chunk size. So, e.g. medium chunks are allocated at
>>             addresses which are a
>>             multiple of medium chunk size. This rule is not extended to
>>             humongous
>>             chunks, whose start addresses continue to be aligned to the
>>             smallest chunk
>>             size.
>>
>>             The reason for this new alignment rule is that it makes it
>>             cheap both to
>>             find chunk predecessors of a chunk and to check which chunks
>>             are free.
>>
>>             When a class loader dies and its chunk is returned to the
>>             freelist, all we
>>             have is its address. In order to merge it with its neighbors
>>             to form a
>>             larger chunk, we need to find those neighbors, including
>>             those preceding
>>             the returned chunk. Prior to this patch that was not easy -
>>             one would have
>>             to iterate chunks starting at the beginning of the
>>             VirtualSpaceNode. But
>>             due to the new alignment rule, we now know where the
>>             prospective larger
>>             chunk must start - at the next lower
>>             larger-chunk-size-aligned boundary. We
>>             also know that currently a smaller chunk must start there (*).
>>
>>             In order to check the free-ness of chunks quickly, each
>>             VirtualSpaceNode
>>             now keeps a bitmap which describes its occupancy. One bit in
>>             this bitmap
>>             corresponds to a range the size of the smallest chunk size
>>             and starting at
>>             an address aligned to the smallest chunk size. Because of
>>             the alignment
>>             rules above, such a range belongs to one single chunk. The
>>             bit is 1 if the
>>             associated chunk is in use by a class loader, 0 if it is free.
>>
>>             When we have calculated the address range a prospective
>>             larger chunk would
>>             span, we now need to check if all chunks in that range are
>>             free. Only then
>>             we can merge them. We do that by querying the bitmap. Note
>>             that the most
>>             common use case here is forming medium chunks from smaller
>>             chunks. With the
>>             new alignment rules, the bitmap portion covering a medium
>>             chunk now always
>>             happens to be 16- or 32bit in size and is 16- or 32bit
>>             aligned, so reading
>>             the bitmap in many cases becomes a simple 16- or 32bit load.
>>
>>             If the range is free, only then we need to iterate the
>>             chunks in that
>>             range: pull them from the freelist, combine them to one new
>>             larger chunk,
>>             re-add that one to the freelist.
>>
>>             (*) Humongous chunks make this a bit more complicated. Since
>>             the new
>>             alignment rule does not extend to them, a humongous chunk
>>             could still
>>             straddle the lower or upper boundary of the prospective
>>             larger chunk. So I
>>             gave the occupancy map a second layer, which is used to mark
>>             the start of
>>             chunks.
>>             An alternative approach could have been to make humongous
>>             chunks size and
>>             start address always a multiple of the largest non-humongous
>>             chunk size
>>             (medium chunks). That would have caused a bit of waste per
>>             humongous chunk
>>             (<64K) in exchange for simpler coding and a simpler
>>             occupancy map.
>>
>>             --
>>
>>             The patch shows its best results in scenarios where a lot of
>>             smallish
>>             class loaders are alive simultaneously. When dying, they
>>             leave continuous
>>             expanses of metaspace covered in small chunks, which can be
>>             merged nicely.
>>             However, if class loader life times vary more, we have more
>>             interleaving of
>>             dead and alive small chunks, and hence chunk merging does
>>             not work as well
>>             as it could.
>>
>>             For an example of a pathological case like this see example
>>             program: [5]
>>
>>             Executed like this: "java -XX:CompressedClassSpaceSize=10M
>>             -cp test3
>>             test3.Example2" the test will load 3000 small classes in
>>             separate class
>>             loaders, then throw them away and start loading large
>>             classes. The small
>>             classes will have flooded the metaspace with small chunks,
>>             which are
>>             unusable for the large classes. When executing with the
>>             rather limited
>>             CompressedClassSpaceSize=10M, we will run into an OOM after
>>             loading about
>>             800 large classes, having used only 40% of the class space,
>>             the rest is
>>             wasted to unused small chunks. However, with our patch the
>>             example program
>>             will manage to allocate ~2900 large classes before running
>>             into an OOM, and
>>             class space will show almost no waste.
>>
>>             Do demonstrate this, add -Xlog:gc+metaspace+freelist. After
>>             running into
>>             an OOM, statistics and an ASCII representation of the class
>>             space will be
>>             shown. The unpatched version will show large expanses of
>>             unused small
>>             chunks, the patched variant will show almost no waste.
>>
>>             Note that the patch could be made more effective with a
>>             different size
>>             ratio between small and medium chunks: in class space, that
>>             ratio is 1:16,
>>             so 16 small chunks must happen to be free to form one larger
>>             chunk. With a
>>             smaller ratio the chance for coalescation would be larger.
>>             So there may be
>>             room for future improvement here: Since we now can merge and
>>             split chunks
>>             on demand, we could introduce more chunk sizes. Potentially
>>             arriving at a
>>             buddy-ish allocator style where we drop hard-wired chunk
>>             sizes for a
>>             dynamic model where the ratio between chunk sizes is always
>>             1:2 and we
>>             could in theory have no limit to the chunk size? But this is
>>             just a thought
>>             and well out of the scope of this patch.
>>
>>             --
>>
>>             What does this patch cost (memory):
>>
>>                - the occupancy bitmap adds 1 byte per 4K metaspace.
>>                - MetaChunk headers get larger, since we add an enum and
>>             two bools to it.
>>             Depending on what the c++ compiler does with that, chunk
>>             headers grow by
>>             one or two MetaWords, reducing the payload size by that
>> amount.
>>             - The new alignment rules mean we may need to create padding
>>             chunks to
>>             precede larger chunks. But since these padding chunks are
>>             added to the
>>             freelist, they should be used up before the need for new
>>             padding chunks
>>             arises. So, the maximally possible number of unused padding
>>             chunks should
>>             be limited by design to about 64K.
>>
>>             The expectation is that the memory savings by this patch far
>>             outweighs its
>>             added memory costs.
>>
>>             .. (performance):
>>
>>             We did not see measurable drops in standard benchmarks
>>             raising over the
>>             normal noise. I also measured times for a program which
>>             stresses metaspace
>>             chunk coalescation, with the same result.
>>
>>             I am open to suggestions what else I should measure, and/or
>>             independent
>>             measurements.
>>
>>             --
>>
>>             Other details:
>>
>>             I removed SpaceManager::get_small_chunk_and_allocate() to
>> reduce
>>             complexity somewhat, because it was made mostly obsolete by
>>             this patch:
>>             since small chunks are combined to larger chunks upon return
>>             to the
>>             freelist, in theory we should not have that many free small
>>             chunks anymore
>>             anyway. However, there may be still cases where we could
>>             benefit from this
>>             workaround, so I am asking your opinion on this one.
>>
>>             About tests: There were two native tests -
>>             ChunkManagerReturnTest and
>>             TestVirtualSpaceNode (the former was added by me last year)
>>             - which did not
>>             make much sense anymore, since they relied heavily on
>>             internal behavior
>>             which was made unpredictable with this patch.
>>             To make up for these lost tests,  I added a new gtest which
>>             attempts to
>>             stress the many combinations of allocation pattern but does
>>             so from a layer
>>             above the old tests. It now uses Metaspace::allocate() and
>>             friends. By
>>             using that point as entry for tests, I am less dependent on
>>             implementation
>>             internals and still cover a lot of scenarios.
>>
>>             --
>>
>>             Review pointers:
>>
>>             Good points to start are
>>             - ChunkManager::return_single_chunk() - specifically,
>>             ChunkManager::attempt_to_coalesce_around_chunk() - here we
>>             merge chunks
>>             upon return to the free list
>>             - ChunkManager::free_chunks_get(): Here we now split large
>>             chunks into
>>             smaller chunks on demand
>>             - VirtualSpaceNode::take_from_committed() : chunks are
>> allocated
>>             according to align rules now, padding chunks are handles
>>             - The OccupancyMap class is the helper class implementing
>>             the new
>>             occupancy bitmap
>>
>>             The rest is mostly chaff: helper functions, added tests and
>>             verifications.
>>
>>             --
>>
>>             Thanks and Best Regards, Thomas
>>
>>             [1] https://bugs.openjdk.java.net/browse/JDK-8166690
>>             <https://bugs.openjdk.java.net/browse/JDK-8166690>
>>             [2]
>>             http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November
>>             <http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November
>> >
>>             /000128.html
>>             [3] https://bugs.openjdk.java.net/browse/JDK-8185034
>>             <https://bugs.openjdk.java.net/browse/JDK-8185034>
>>             [4] https://bugs.openjdk.java.net/browse/JDK-8176808
>>             <https://bugs.openjdk.java.net/browse/JDK-8176808>
>>             [5]
>>             https://bugs.openjdk.java.net/secure/attachment/63532/test3.
>> zip
>>             <https://bugs.openjdk.java.net/secure/attachment/63532/test3
>> .zip>
>>
>>
>>
>>
>>