RFR(L): 8198423: Improve metaspace chunk allocation (was: Proposal for improvements to the metaspace chunk allocator)
Thomas Stüfe
thomas.stuefe at gmail.com
Thu Mar 8 17:18:07 UTC 2018
Hey Eric, welcome to the party :)
We were about to push the change (Coleen is running some final tests) but I
am happy to wait a bit more, the more eyes the better.
Further comments inline:
On Thu, Mar 8, 2018 at 5:14 PM, Erik Helin <erik.helin at oracle.com> wrote:
> Alright, I know I am bit late to the party, but I have now refreshed my
> knowledge of Metaspace enough to begin reviewing this. I will probably ask
> a few questions to get a better understanding of your changes :)
>
> First of all, great work! The code is easy to read and the concepts are
> clear, so thank you for that.
>
>
Thanks!
Still, code has gotten quite bloaty, and my change is not helping. I have
some patches in the work to reduce complexity.
An initial commend:
>
> 1592 // Chunks are born as in-use (see MetaChunk ctor). So, before
> returning
> 1593 // the padding chunk to its chunk manager, mark it as in use
> (ChunkManager
> 1594 // will assert that).
> 1595 do_update_in_use_info_for_chunk(padding_chunk, true);
>
> This comment is slightly hard to read. I _think_ what you are meaning is
> something like:
>
> // So, before returning the padding chunk to its chunk manger,
> // mark it as in use in the the occupancy map.
>
> Is that correct?
Correct. Yes, the comment is a bit convoluted. I meant: the chunk needs to
appear to ChunkManager::return_chunk as a valid in-use chunks, because that
it expects.
> Besides updating the occupancy map, do_update_in_use_info_for_chunk will
> also call MetaChunk::set_is_tagged_free, but that is not strictly needed
> here, right?
>
>
Yes, this is correct.
> Thanks,
> Erik
>
> On 02/28/2018 05:17 PM, Thomas Stüfe wrote:
>
>> Hi Eric, no problem!
>>
>> Thanks, Thomas
>>
>> On Wed, Feb 28, 2018 at 4:28 PM, Erik Helin <erik.helin at oracle.com
>> <mailto:erik.helin at oracle.com>> wrote:
>>
>> Hi Thomas,
>>
>> I will take a look at this, I just have been a bit busy lately
>> (sorry for not responding earlier).
>>
>> Thanks,
>> Erik
>>
>>
>> On 02/26/2018 03:20 PM, Thomas Stüfe wrote:
>>
>> Hi all,
>>
>> I know this patch is a bit larger, but may I please have reviews
>> and/or
>> other input?
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8198423
>> <https://bugs.openjdk.java.net/browse/JDK-8198423>
>> Latest version:
>> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc
>> ation/2018-02-26/webrev/
>> <http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coales
>> cation/2018-02-26/webrev/>
>>
>> For those who followed the mail thread, this is the incremental
>> diff to the
>> last changes (included feedback Goetz gave me on- and off-list):
>> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc
>> ation/2018-02-26/webrev-incr/webrev/
>> <http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coales
>> cation/2018-02-26/webrev-incr/webrev/>
>>
>> Thank you!
>>
>> Kind Regards, Thomas Stuefe
>>
>>
>>
>> On Thu, Feb 8, 2018 at 12:58 PM, Thomas Stüfe
>> <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>>
>>
>> wrote:
>>
>> Hi,
>>
>> We would like to contribute a patch developed at SAP which
>> has been live
>> in our VM for some time. It improves the metaspace chunk
>> allocation:
>> reduces fragmentation and raises the chance of reusing free
>> metaspace
>> chunks.
>>
>> The patch:
>> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc
>> <http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc
>> >
>> ation/2018-02-05--2/webrev/
>>
>> In very short, this patch helps with a number of
>> pathological cases where
>> metaspace chunks are free but cannot be reused because they
>> are of the
>> wrong size. For example, the metaspace freelist could be
>> full of small
>> chunks, which would not be reusable if we need larger
>> chunks. So, we could
>> get metaspace OOMs even in situations where the metaspace
>> was far from
>> exhausted. Our patch adds the ability to split and merge
>> metaspace chunks
>> dynamically and thus remove the "size-lock-in" problem.
>>
>> Note that there have been other attempts to get a grip on
>> this problem,
>> see e.g. "SpaceManager::get_small_chunks_and_allocate()".
>> But arguably
>> our patch attempts a more complete solution.
>>
>> In 2016 I discussed the idea for this patch with some folks
>> off-list,
>> among them Jon Matsimutso. He then did advice me to create a
>> JEP. So I did:
>> [1]. However, meanwhile changes to the JEP process were
>> discussed [2], and
>> I am not sure anymore this patch needs even needs a JEP. It
>> may be
>> moderately complex and hence carries the risk inherent in
>> any patch, but
>> its effects would not be externally visible (if you discount
>> seeing fewer
>> metaspace OOMs). So, I'd prefer to handle this as a simple
>> RFE.
>>
>> --
>>
>> How this patch works:
>>
>> 1) When a class loader dies, its metaspace chunks are freed
>> and returned
>> to the freelist for reuse by the next class loader. With the
>> patch, upon
>> returning a chunk to the freelist, an attempt is made to
>> merge it with its
>> neighboring chunks - should they happen to be free too - to
>> form a larger
>> chunk. Which then is placed in the free list.
>>
>> As a result, the freelist should be populated by larger
>> chunks at the
>> expense of smaller chunks. In other words, all free chunks
>> should always be
>> as "coalesced as possible".
>>
>> 2) When a class loader needs a new chunk and a chunk of the
>> requested size
>> cannot be found in the free list, before carving out a new
>> chunk from the
>> virtual space, we first check if there is a larger chunk in
>> the free list.
>> If there is, that larger chunk is chopped up into n smaller
>> chunks. One of
>> them is returned to the callers, the others are re-added to
>> the freelist.
>>
>> (1) and (2) together have the effect of removing the
>> size-lock-in for
>> chunks. If fragmentation allows it, small chunks are
>> dynamically combined
>> to form larger chunks, and larger chunks are split on demand.
>>
>> --
>>
>> What this patch does not:
>>
>> This is not a rewrite of the chunk allocator - most of the
>> mechanisms stay
>> intact. Specifically, chunk sizes remain unchanged, and so
>> do chunk
>> allocation processes (when do which class loaders get handed
>> which chunk
>> size). Almost everthing this patch does affects only
>> internal workings of
>> the ChunkManager.
>>
>> Also note that I refrained from doing any cleanups, since I
>> wanted
>> reviewers to be able to gauge this patch without filtering
>> noise.
>> Unfortunately this patch adds some complexity. But there are
>> many future
>> opportunities for code cleanup and simplification, some of
>> which we already
>> discussed in existing RFEs ([3], [4]). All of them are out
>> of the scope for
>> this particular patch.
>>
>> --
>>
>> Details:
>>
>> Before the patch, the following rules held:
>> - All chunk sizes are multiples of the smallest chunk size
>> ("specialized
>> chunks")
>> - All chunk sizes of larger chunks are also clean multiples
>> of the next
>> smaller chunk size (e.g. for class space, the ratio of
>> specialized/small/medium chunks is 1:2:32)
>> - All chunk start addresses are aligned to the smallest
>> chunk size (more
>> or less accidentally, see metaspace_reserve_alignment).
>> The patch makes the last rule explicit and more strict:
>> - All (non-humongous) chunk start addresses are now aligned
>> to their own
>> chunk size. So, e.g. medium chunks are allocated at
>> addresses which are a
>> multiple of medium chunk size. This rule is not extended to
>> humongous
>> chunks, whose start addresses continue to be aligned to the
>> smallest chunk
>> size.
>>
>> The reason for this new alignment rule is that it makes it
>> cheap both to
>> find chunk predecessors of a chunk and to check which chunks
>> are free.
>>
>> When a class loader dies and its chunk is returned to the
>> freelist, all we
>> have is its address. In order to merge it with its neighbors
>> to form a
>> larger chunk, we need to find those neighbors, including
>> those preceding
>> the returned chunk. Prior to this patch that was not easy -
>> one would have
>> to iterate chunks starting at the beginning of the
>> VirtualSpaceNode. But
>> due to the new alignment rule, we now know where the
>> prospective larger
>> chunk must start - at the next lower
>> larger-chunk-size-aligned boundary. We
>> also know that currently a smaller chunk must start there (*).
>>
>> In order to check the free-ness of chunks quickly, each
>> VirtualSpaceNode
>> now keeps a bitmap which describes its occupancy. One bit in
>> this bitmap
>> corresponds to a range the size of the smallest chunk size
>> and starting at
>> an address aligned to the smallest chunk size. Because of
>> the alignment
>> rules above, such a range belongs to one single chunk. The
>> bit is 1 if the
>> associated chunk is in use by a class loader, 0 if it is free.
>>
>> When we have calculated the address range a prospective
>> larger chunk would
>> span, we now need to check if all chunks in that range are
>> free. Only then
>> we can merge them. We do that by querying the bitmap. Note
>> that the most
>> common use case here is forming medium chunks from smaller
>> chunks. With the
>> new alignment rules, the bitmap portion covering a medium
>> chunk now always
>> happens to be 16- or 32bit in size and is 16- or 32bit
>> aligned, so reading
>> the bitmap in many cases becomes a simple 16- or 32bit load.
>>
>> If the range is free, only then we need to iterate the
>> chunks in that
>> range: pull them from the freelist, combine them to one new
>> larger chunk,
>> re-add that one to the freelist.
>>
>> (*) Humongous chunks make this a bit more complicated. Since
>> the new
>> alignment rule does not extend to them, a humongous chunk
>> could still
>> straddle the lower or upper boundary of the prospective
>> larger chunk. So I
>> gave the occupancy map a second layer, which is used to mark
>> the start of
>> chunks.
>> An alternative approach could have been to make humongous
>> chunks size and
>> start address always a multiple of the largest non-humongous
>> chunk size
>> (medium chunks). That would have caused a bit of waste per
>> humongous chunk
>> (<64K) in exchange for simpler coding and a simpler
>> occupancy map.
>>
>> --
>>
>> The patch shows its best results in scenarios where a lot of
>> smallish
>> class loaders are alive simultaneously. When dying, they
>> leave continuous
>> expanses of metaspace covered in small chunks, which can be
>> merged nicely.
>> However, if class loader life times vary more, we have more
>> interleaving of
>> dead and alive small chunks, and hence chunk merging does
>> not work as well
>> as it could.
>>
>> For an example of a pathological case like this see example
>> program: [5]
>>
>> Executed like this: "java -XX:CompressedClassSpaceSize=10M
>> -cp test3
>> test3.Example2" the test will load 3000 small classes in
>> separate class
>> loaders, then throw them away and start loading large
>> classes. The small
>> classes will have flooded the metaspace with small chunks,
>> which are
>> unusable for the large classes. When executing with the
>> rather limited
>> CompressedClassSpaceSize=10M, we will run into an OOM after
>> loading about
>> 800 large classes, having used only 40% of the class space,
>> the rest is
>> wasted to unused small chunks. However, with our patch the
>> example program
>> will manage to allocate ~2900 large classes before running
>> into an OOM, and
>> class space will show almost no waste.
>>
>> Do demonstrate this, add -Xlog:gc+metaspace+freelist. After
>> running into
>> an OOM, statistics and an ASCII representation of the class
>> space will be
>> shown. The unpatched version will show large expanses of
>> unused small
>> chunks, the patched variant will show almost no waste.
>>
>> Note that the patch could be made more effective with a
>> different size
>> ratio between small and medium chunks: in class space, that
>> ratio is 1:16,
>> so 16 small chunks must happen to be free to form one larger
>> chunk. With a
>> smaller ratio the chance for coalescation would be larger.
>> So there may be
>> room for future improvement here: Since we now can merge and
>> split chunks
>> on demand, we could introduce more chunk sizes. Potentially
>> arriving at a
>> buddy-ish allocator style where we drop hard-wired chunk
>> sizes for a
>> dynamic model where the ratio between chunk sizes is always
>> 1:2 and we
>> could in theory have no limit to the chunk size? But this is
>> just a thought
>> and well out of the scope of this patch.
>>
>> --
>>
>> What does this patch cost (memory):
>>
>> - the occupancy bitmap adds 1 byte per 4K metaspace.
>> - MetaChunk headers get larger, since we add an enum and
>> two bools to it.
>> Depending on what the c++ compiler does with that, chunk
>> headers grow by
>> one or two MetaWords, reducing the payload size by that
>> amount.
>> - The new alignment rules mean we may need to create padding
>> chunks to
>> precede larger chunks. But since these padding chunks are
>> added to the
>> freelist, they should be used up before the need for new
>> padding chunks
>> arises. So, the maximally possible number of unused padding
>> chunks should
>> be limited by design to about 64K.
>>
>> The expectation is that the memory savings by this patch far
>> outweighs its
>> added memory costs.
>>
>> .. (performance):
>>
>> We did not see measurable drops in standard benchmarks
>> raising over the
>> normal noise. I also measured times for a program which
>> stresses metaspace
>> chunk coalescation, with the same result.
>>
>> I am open to suggestions what else I should measure, and/or
>> independent
>> measurements.
>>
>> --
>>
>> Other details:
>>
>> I removed SpaceManager::get_small_chunk_and_allocate() to
>> reduce
>> complexity somewhat, because it was made mostly obsolete by
>> this patch:
>> since small chunks are combined to larger chunks upon return
>> to the
>> freelist, in theory we should not have that many free small
>> chunks anymore
>> anyway. However, there may be still cases where we could
>> benefit from this
>> workaround, so I am asking your opinion on this one.
>>
>> About tests: There were two native tests -
>> ChunkManagerReturnTest and
>> TestVirtualSpaceNode (the former was added by me last year)
>> - which did not
>> make much sense anymore, since they relied heavily on
>> internal behavior
>> which was made unpredictable with this patch.
>> To make up for these lost tests, I added a new gtest which
>> attempts to
>> stress the many combinations of allocation pattern but does
>> so from a layer
>> above the old tests. It now uses Metaspace::allocate() and
>> friends. By
>> using that point as entry for tests, I am less dependent on
>> implementation
>> internals and still cover a lot of scenarios.
>>
>> --
>>
>> Review pointers:
>>
>> Good points to start are
>> - ChunkManager::return_single_chunk() - specifically,
>> ChunkManager::attempt_to_coalesce_around_chunk() - here we
>> merge chunks
>> upon return to the free list
>> - ChunkManager::free_chunks_get(): Here we now split large
>> chunks into
>> smaller chunks on demand
>> - VirtualSpaceNode::take_from_committed() : chunks are
>> allocated
>> according to align rules now, padding chunks are handles
>> - The OccupancyMap class is the helper class implementing
>> the new
>> occupancy bitmap
>>
>> The rest is mostly chaff: helper functions, added tests and
>> verifications.
>>
>> --
>>
>> Thanks and Best Regards, Thomas
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8166690
>> <https://bugs.openjdk.java.net/browse/JDK-8166690>
>> [2]
>> http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November
>> <http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November
>> >
>> /000128.html
>> [3] https://bugs.openjdk.java.net/browse/JDK-8185034
>> <https://bugs.openjdk.java.net/browse/JDK-8185034>
>> [4] https://bugs.openjdk.java.net/browse/JDK-8176808
>> <https://bugs.openjdk.java.net/browse/JDK-8176808>
>> [5]
>> https://bugs.openjdk.java.net/secure/attachment/63532/test3.
>> zip
>> <https://bugs.openjdk.java.net/secure/attachment/63532/test3
>> .zip>
>>
>>
>>
>>
>>
More information about the hotspot-dev
mailing list