RFR(L): 8198423: Improve metaspace chunk allocation (was: Proposal for improvements to the metaspace chunk allocator)
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Mon Mar 5 23:59:22 UTC 2018
Hi Thomas,
I've read through the new code. I don't have any substantive comments.
Thank you for adding the functions.
Has this been tested on any 32 bit platforms? I will sponsor this when
you have another reviewer.
Thanks for taking on the metaspace!
Coleen
On 3/1/18 5:36 AM, Thomas Stüfe wrote:
> Hi Coleen,
>
> thanks a lot for the review and the sponsoring offer!
>
> New version (full):
> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-full/webrev/
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-full/webrev/>
> incremental:
> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-incr/webrev/
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-incr/webrev/>
>
> Please find remarks inline:
>
>
> On Tue, Feb 27, 2018 at 11:22 PM, <coleen.phillimore at oracle.com
> <mailto:coleen.phillimore at oracle.com>> wrote:
>
>
> Thomas, review comments:
>
> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metachunk.hpp.udiff.html
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metachunk.hpp.udiff.html>
>
> +// ChunkIndex (todo: rename?) defines the type of chunk. Chunk types
>
>
> It's really both, isn't it? The type is the index into the free
> list or in use lists. The name seems fine.
>
>
> You are right. What I meant was that a lot of code needs to know about
> the different chunk sizes, but naming it "Index" and adding enum
> values like "NumberOfFreeLists" we expose implementation details
> no-one outside of SpaceManager and ChunkManager cares about (namely,
> the fact that these values are internally used as indices into
> arrays). A more neutral naming would be something like "enum
> ChunkTypes { spec,small, .... , NumberOfNonHumongousChunkTypes,
> NumberOfChunkTypes }.
>
> However, I can leave this out for a possible future cleanup. The
> change is big enough as it is.
>
> Can you add comments on the #endifs if the #ifdef is more than a
> couple 2-3 lines above (it's a nit that bothers me).
>
> +#ifdef ASSERT
> + // A 32bit sentinel for debugging purposes.
> +#define CHUNK_SENTINEL 0x4d4554EF // "MET"
> +#define CHUNK_SENTINEL_INVALID 0xFEEEEEEF
> + uint32_t _sentinel;
> +#endif
> + const ChunkIndex _chunk_type;
> + const bool _is_class;
> + // Whether the chunk is free (in freelist) or in use by some
> class loader.
> bool _is_tagged_free;
> +#ifdef ASSERT
> + ChunkOrigin _origin;
> + int _use_count;
> +#endif
> +
>
>
> I removed the asserts completely, following your suggestion below that
> "origin" would be valuable in customer scenarios too. By that logic,
> the other members are valuable too: the sentinel is valuable when
> examining memory dumps to see the start of chunks, and the in-use
> counter is useful too. What do you think?
>
> So, I leave the members in - which, depending what the C++ compiler
> does to enums and bools, may cost up to 128bit additional header
> space. I think that is ok. In one of my earlier versions of this patch
> I hand-crafted the header using chars and bitfields to be as small as
> possible, but that seemed over-engineered.
>
> However, I left out any automatic verifications accessing these debug
> members. These are still only done in debug builds.
>
>
> It seems that if you could move origin and _use_count into the
> ASSERT block above (maybe putting use_count before _origin.
>
> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metaspace.cpp.udiff.html
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metaspace.cpp.udiff.html>
>
> In take_from_committed, can the allocation of padding chunks be
> its own function like add_chunks_to_aligment() lines 1574-1615?
> The function is too long now.
>
>
> I moved the padding chunk allocation into an own function as you
> suggested.
>
> I don't think coalescation is a word in English, at least my
> dictionary cannot find it. Although it makes sense in the
> context, just distracting.
>
>
> I replaced "coalescation" with "chunk merging" throughout the code.
> Also less of a tongue breaker.
>
> + // Now check if in the coalescation area there are still life
> chunks.
>
>
> "live" chunks I guess. A sentence you won't read often :).
>
>
> Now that I read it it almost sounded sinister :) Fixed.
>
>
> In free_chunks_get() can you handle the Humongous case first? The
> else for humongous chunk size is buried tons of lines below.
>
> Otherwise it might be helpful to the logic to make your addition
> to this function be a function you call like
> chunk = split_from_larger_free_chunk();
>
>
> I did the latter. I moved the splitting of a larger chunk to an own
> function. This causes a slight logic change: the new function
> (ChunkManager::split_chunk()) splits an existing large free chunks
> into n smaller free chunks and adds them all back to the freelist -
> that includes the chunk we are about to return. That allows us to use
> the same exit path - which removes the chunk from the freelist and
> adjusts all counters - in the caller function
> "ChunkManager::free_chunks_get" instead of having to return in the
> middle of the function.
>
> To make the test more readable, I also remove the
> "test-that-free-chunks-are-optimally-merged" verification - which was
> quite lengthy - from VirtualSpaceNode::verify() to a new function,
> VirtualSpaceNode::verify_free_chunks_are_ideally_merged().
>
>
> You might want to keep the origin in product mode if it doesn't
> add to the chunk footprint. Might help with customer debugging.
>
>
> See above
>
> Awesome looking test...
>
>
> Thanks, I was worried it would be too complicated.
> I changed it a bit because there were sporadic errors. Not a "real"
> error, just the test itself was faulty. The "metaspaces_in_use"
> counter was slightly wrong in one corner case.
>
> I've read through most of this and thank you for adding this to at
> least partially solve the fragmentation problem. The irony is
> that we templatized the Dictionary from CMS so that we could use
> it for Metaspace and that has splitting and coalescing but it
> seems this code makes more sense than adapting that code (if it's
> even possible).
>
>
> Well, it helps other metadata use cases too, no.
>
>
> Thank you for working on this. I'll sponsor this for you.
>
> Coleen
>
>
>
> Thanks again!
>
> I also updated my jdk-submit branch to include these latest changes;
> tests are still runnning.
>
> Kind Regards, Thomas
>
>
> On 2/26/18 9:20 AM, Thomas Stüfe wrote:
>
> Hi all,
>
> I know this patch is a bit larger, but may I please have
> reviews and/or
> other input?
>
> Issue: https://bugs.openjdk.java.net/browse/JDK-8198423
> <https://bugs.openjdk.java.net/browse/JDK-8198423>
> Latest version:
> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/>
>
> For those who followed the mail thread, this is the
> incremental diff to the
> last changes (included feedback Goetz gave me on- and off-list):
> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev-incr/webrev/
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev-incr/webrev/>
>
> Thank you!
>
> Kind Regards, Thomas Stuefe
>
>
>
> On Thu, Feb 8, 2018 at 12:58 PM, Thomas Stüfe
> <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>>
> wrote:
>
> Hi,
>
> We would like to contribute a patch developed at SAP which
> has been live
> in our VM for some time. It improves the metaspace chunk
> allocation:
> reduces fragmentation and raises the chance of reusing
> free metaspace
> chunks.
>
> The patch:
> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalesc>
> ation/2018-02-05--2/webrev/
>
> In very short, this patch helps with a number of
> pathological cases where
> metaspace chunks are free but cannot be reused because
> they are of the
> wrong size. For example, the metaspace freelist could be
> full of small
> chunks, which would not be reusable if we need larger
> chunks. So, we could
> get metaspace OOMs even in situations where the metaspace
> was far from
> exhausted. Our patch adds the ability to split and merge
> metaspace chunks
> dynamically and thus remove the "size-lock-in" problem.
>
> Note that there have been other attempts to get a grip on
> this problem,
> see e.g. "SpaceManager::get_small_chunks_and_allocate()".
> But arguably
> our patch attempts a more complete solution.
>
> In 2016 I discussed the idea for this patch with some
> folks off-list,
> among them Jon Matsimutso. He then did advice me to create
> a JEP. So I did:
> [1]. However, meanwhile changes to the JEP process were
> discussed [2], and
> I am not sure anymore this patch needs even needs a JEP.
> It may be
> moderately complex and hence carries the risk inherent in
> any patch, but
> its effects would not be externally visible (if you
> discount seeing fewer
> metaspace OOMs). So, I'd prefer to handle this as a simple
> RFE.
>
> --
>
> How this patch works:
>
> 1) When a class loader dies, its metaspace chunks are
> freed and returned
> to the freelist for reuse by the next class loader. With
> the patch, upon
> returning a chunk to the freelist, an attempt is made to
> merge it with its
> neighboring chunks - should they happen to be free too -
> to form a larger
> chunk. Which then is placed in the free list.
>
> As a result, the freelist should be populated by larger
> chunks at the
> expense of smaller chunks. In other words, all free chunks
> should always be
> as "coalesced as possible".
>
> 2) When a class loader needs a new chunk and a chunk of
> the requested size
> cannot be found in the free list, before carving out a new
> chunk from the
> virtual space, we first check if there is a larger chunk
> in the free list.
> If there is, that larger chunk is chopped up into n
> smaller chunks. One of
> them is returned to the callers, the others are re-added
> to the freelist.
>
> (1) and (2) together have the effect of removing the
> size-lock-in for
> chunks. If fragmentation allows it, small chunks are
> dynamically combined
> to form larger chunks, and larger chunks are split on demand.
>
> --
>
> What this patch does not:
>
> This is not a rewrite of the chunk allocator - most of the
> mechanisms stay
> intact. Specifically, chunk sizes remain unchanged, and so
> do chunk
> allocation processes (when do which class loaders get
> handed which chunk
> size). Almost everthing this patch does affects only
> internal workings of
> the ChunkManager.
>
> Also note that I refrained from doing any cleanups, since
> I wanted
> reviewers to be able to gauge this patch without filtering
> noise.
> Unfortunately this patch adds some complexity. But there
> are many future
> opportunities for code cleanup and simplification, some of
> which we already
> discussed in existing RFEs ([3], [4]). All of them are out
> of the scope for
> this particular patch.
>
> --
>
> Details:
>
> Before the patch, the following rules held:
> - All chunk sizes are multiples of the smallest chunk size
> ("specialized
> chunks")
> - All chunk sizes of larger chunks are also clean
> multiples of the next
> smaller chunk size (e.g. for class space, the ratio of
> specialized/small/medium chunks is 1:2:32)
> - All chunk start addresses are aligned to the smallest
> chunk size (more
> or less accidentally, see metaspace_reserve_alignment).
> The patch makes the last rule explicit and more strict:
> - All (non-humongous) chunk start addresses are now
> aligned to their own
> chunk size. So, e.g. medium chunks are allocated at
> addresses which are a
> multiple of medium chunk size. This rule is not extended
> to humongous
> chunks, whose start addresses continue to be aligned to
> the smallest chunk
> size.
>
> The reason for this new alignment rule is that it makes it
> cheap both to
> find chunk predecessors of a chunk and to check which
> chunks are free.
>
> When a class loader dies and its chunk is returned to the
> freelist, all we
> have is its address. In order to merge it with its
> neighbors to form a
> larger chunk, we need to find those neighbors, including
> those preceding
> the returned chunk. Prior to this patch that was not easy
> - one would have
> to iterate chunks starting at the beginning of the
> VirtualSpaceNode. But
> due to the new alignment rule, we now know where the
> prospective larger
> chunk must start - at the next lower
> larger-chunk-size-aligned boundary. We
> also know that currently a smaller chunk must start there (*).
>
> In order to check the free-ness of chunks quickly, each
> VirtualSpaceNode
> now keeps a bitmap which describes its occupancy. One bit
> in this bitmap
> corresponds to a range the size of the smallest chunk size
> and starting at
> an address aligned to the smallest chunk size. Because of
> the alignment
> rules above, such a range belongs to one single chunk. The
> bit is 1 if the
> associated chunk is in use by a class loader, 0 if it is free.
>
> When we have calculated the address range a prospective
> larger chunk would
> span, we now need to check if all chunks in that range are
> free. Only then
> we can merge them. We do that by querying the bitmap. Note
> that the most
> common use case here is forming medium chunks from smaller
> chunks. With the
> new alignment rules, the bitmap portion covering a medium
> chunk now always
> happens to be 16- or 32bit in size and is 16- or 32bit
> aligned, so reading
> the bitmap in many cases becomes a simple 16- or 32bit load.
>
> If the range is free, only then we need to iterate the
> chunks in that
> range: pull them from the freelist, combine them to one
> new larger chunk,
> re-add that one to the freelist.
>
> (*) Humongous chunks make this a bit more complicated.
> Since the new
> alignment rule does not extend to them, a humongous chunk
> could still
> straddle the lower or upper boundary of the prospective
> larger chunk. So I
> gave the occupancy map a second layer, which is used to
> mark the start of
> chunks.
> An alternative approach could have been to make humongous
> chunks size and
> start address always a multiple of the largest
> non-humongous chunk size
> (medium chunks). That would have caused a bit of waste per
> humongous chunk
> (<64K) in exchange for simpler coding and a simpler
> occupancy map.
>
> --
>
> The patch shows its best results in scenarios where a lot
> of smallish
> class loaders are alive simultaneously. When dying, they
> leave continuous
> expanses of metaspace covered in small chunks, which can
> be merged nicely.
> However, if class loader life times vary more, we have
> more interleaving of
> dead and alive small chunks, and hence chunk merging does
> not work as well
> as it could.
>
> For an example of a pathological case like this see
> example program: [5]
>
> Executed like this: "java -XX:CompressedClassSpaceSize=10M
> -cp test3
> test3.Example2" the test will load 3000 small classes in
> separate class
> loaders, then throw them away and start loading large
> classes. The small
> classes will have flooded the metaspace with small chunks,
> which are
> unusable for the large classes. When executing with the
> rather limited
> CompressedClassSpaceSize=10M, we will run into an OOM
> after loading about
> 800 large classes, having used only 40% of the class
> space, the rest is
> wasted to unused small chunks. However, with our patch the
> example program
> will manage to allocate ~2900 large classes before running
> into an OOM, and
> class space will show almost no waste.
>
> Do demonstrate this, add -Xlog:gc+metaspace+freelist.
> After running into
> an OOM, statistics and an ASCII representation of the
> class space will be
> shown. The unpatched version will show large expanses of
> unused small
> chunks, the patched variant will show almost no waste.
>
> Note that the patch could be made more effective with a
> different size
> ratio between small and medium chunks: in class space,
> that ratio is 1:16,
> so 16 small chunks must happen to be free to form one
> larger chunk. With a
> smaller ratio the chance for coalescation would be larger.
> So there may be
> room for future improvement here: Since we now can merge
> and split chunks
> on demand, we could introduce more chunk sizes.
> Potentially arriving at a
> buddy-ish allocator style where we drop hard-wired chunk
> sizes for a
> dynamic model where the ratio between chunk sizes is
> always 1:2 and we
> could in theory have no limit to the chunk size? But this
> is just a thought
> and well out of the scope of this patch.
>
> --
>
> What does this patch cost (memory):
>
> - the occupancy bitmap adds 1 byte per 4K metaspace.
> - MetaChunk headers get larger, since we add an enum and
> two bools to it.
> Depending on what the c++ compiler does with that, chunk
> headers grow by
> one or two MetaWords, reducing the payload size by that
> amount.
> - The new alignment rules mean we may need to create
> padding chunks to
> precede larger chunks. But since these padding chunks are
> added to the
> freelist, they should be used up before the need for new
> padding chunks
> arises. So, the maximally possible number of unused
> padding chunks should
> be limited by design to about 64K.
>
> The expectation is that the memory savings by this patch
> far outweighs its
> added memory costs.
>
> .. (performance):
>
> We did not see measurable drops in standard benchmarks
> raising over the
> normal noise. I also measured times for a program which
> stresses metaspace
> chunk coalescation, with the same result.
>
> I am open to suggestions what else I should measure,
> and/or independent
> measurements.
>
> --
>
> Other details:
>
> I removed SpaceManager::get_small_chunk_and_allocate() to
> reduce
> complexity somewhat, because it was made mostly obsolete
> by this patch:
> since small chunks are combined to larger chunks upon
> return to the
> freelist, in theory we should not have that many free
> small chunks anymore
> anyway. However, there may be still cases where we could
> benefit from this
> workaround, so I am asking your opinion on this one.
>
> About tests: There were two native tests -
> ChunkManagerReturnTest and
> TestVirtualSpaceNode (the former was added by me last
> year) - which did not
> make much sense anymore, since they relied heavily on
> internal behavior
> which was made unpredictable with this patch.
> To make up for these lost tests, I added a new gtest
> which attempts to
> stress the many combinations of allocation pattern but
> does so from a layer
> above the old tests. It now uses Metaspace::allocate() and
> friends. By
> using that point as entry for tests, I am less dependent
> on implementation
> internals and still cover a lot of scenarios.
>
> --
>
> Review pointers:
>
> Good points to start are
> - ChunkManager::return_single_chunk() - specifically,
> ChunkManager::attempt_to_coalesce_around_chunk() - here we
> merge chunks
> upon return to the free list
> - ChunkManager::free_chunks_get(): Here we now split large
> chunks into
> smaller chunks on demand
> - VirtualSpaceNode::take_from_committed() : chunks are
> allocated
> according to align rules now, padding chunks are handles
> - The OccupancyMap class is the helper class implementing
> the new
> occupancy bitmap
>
> The rest is mostly chaff: helper functions, added tests
> and verifications.
>
> --
>
> Thanks and Best Regards, Thomas
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8166690
> <https://bugs.openjdk.java.net/browse/JDK-8166690>
> [2]
> http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November
> <http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November>
> /000128.html
> [3] https://bugs.openjdk.java.net/browse/JDK-8185034
> <https://bugs.openjdk.java.net/browse/JDK-8185034>
> [4] https://bugs.openjdk.java.net/browse/JDK-8176808
> <https://bugs.openjdk.java.net/browse/JDK-8176808>
> [5]
> https://bugs.openjdk.java.net/secure/attachment/63532/test3.zip
> <https://bugs.openjdk.java.net/secure/attachment/63532/test3.zip>
>
>
>
>
>
More information about the hotspot-runtime-dev
mailing list