RFR(L): 8198423: Improve metaspace chunk allocation (was: Proposal for improvements to the metaspace chunk allocator)

Mon Mar 5 23:59:22 UTC 2018

Hi Thomas,

I've read through the new code.  I don't have any substantive comments.  
Thank you for adding the functions.

Has this been tested on any 32 bit platforms?   I will sponsor this when 
you have another reviewer.

Thanks for taking on the metaspace!

Coleen

On 3/1/18 5:36 AM, Thomas Stüfe wrote:
> Hi Coleen,
>
> thanks a lot for the review and the sponsoring offer!
>
> New version (full): 
> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-full/webrev/ 
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-full/webrev/>
> incremental: 
> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-incr/webrev/ 
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-incr/webrev/>
>
> Please find remarks inline:
>
>
> On Tue, Feb 27, 2018 at 11:22 PM, <coleen.phillimore at oracle.com 
> <mailto:coleen.phillimore at oracle.com>> wrote:
>
>
>     Thomas, review comments:
>
>     http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metachunk.hpp.udiff.html
>     <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metachunk.hpp.udiff.html>
>
>     +// ChunkIndex (todo: rename?) defines the type of chunk. Chunk types
>
>
>     It's really both, isn't it?  The type is the index into the free
>     list or in use lists.  The name seems fine.
>
>
> You are right. What I meant was that a lot of code needs to know about 
> the different chunk sizes, but naming it "Index" and adding enum 
> values like "NumberOfFreeLists" we expose implementation details 
> no-one outside of SpaceManager and ChunkManager cares about (namely, 
> the fact that these values are internally used as indices into 
> arrays). A more neutral naming would be something like "enum 
> ChunkTypes { spec,small, .... , NumberOfNonHumongousChunkTypes, 
> NumberOfChunkTypes }.
>
> However, I can leave this out for a possible future cleanup. The 
> change is big enough as it is.
>
>     Can you add comments on the #endifs if the #ifdef is more than a
>     couple 2-3 lines above (it's a nit that bothers me).
>
>     +#ifdef ASSERT
>     + // A 32bit sentinel for debugging purposes.
>     +#define CHUNK_SENTINEL 0x4d4554EF // "MET"
>     +#define CHUNK_SENTINEL_INVALID 0xFEEEEEEF
>     + uint32_t _sentinel;
>     +#endif
>     + const ChunkIndex _chunk_type;
>     + const bool _is_class;
>     + // Whether the chunk is free (in freelist) or in use by some
>     class loader.
>        bool _is_tagged_free;
>      +#ifdef ASSERT
>     + ChunkOrigin _origin;
>     + int _use_count;
>     +#endif
>     +
>
>
> I removed the asserts completely, following your suggestion below that 
> "origin" would be valuable in customer scenarios too. By that logic, 
> the other members are valuable too: the sentinel is valuable when 
> examining memory dumps to see the start of chunks, and the in-use 
> counter is useful too. What do you think?
>
> So, I leave the members in - which, depending what the C++ compiler 
> does to enums and bools, may cost up to 128bit additional header 
> space. I think that is ok. In one of my earlier versions of this patch 
> I hand-crafted the header using chars and bitfields to be as small as 
> possible, but that seemed over-engineered.
>
> However, I left out any automatic verifications accessing these debug 
> members. These are still only done in debug builds.
>
>
>     It seems that if you could move origin and _use_count into the
>     ASSERT block above (maybe putting use_count before _origin.
>
>     http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metaspace.cpp.udiff.html
>     <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metaspace.cpp.udiff.html>
>
>     In take_from_committed, can the allocation of padding chunks be
>     its own function like add_chunks_to_aligment() lines 1574-1615?
>     The function is too long now.
>
>
> I moved the padding chunk allocation into an own function as you 
> suggested.
>
>     I don't think coalescation is a word in English, at least my
>     dictionary cannot find it.  Although it makes sense in the
>     context, just distracting.
>
>
> I replaced "coalescation" with "chunk merging" throughout the code. 
> Also less of a tongue breaker.
>
>     + // Now check if in the coalescation area there are still life
>     chunks.
>
>
>     "live" chunks I guess.   A sentence you won't read often :).
>
>
> Now that I read it it almost sounded sinister :) Fixed.
>
>
>     In free_chunks_get() can you handle the Humongous case first? The
>     else for humongous chunk size is buried tons of lines below.
>
>     Otherwise it might be helpful to the logic to make your addition
>     to this function be a function you call like
>       chunk = split_from_larger_free_chunk();
>
>
> I did the latter. I moved the splitting of a larger chunk to an own 
> function. This causes a slight logic change: the new function 
> (ChunkManager::split_chunk()) splits an existing large free chunks 
> into n smaller free chunks and adds them all back to the freelist - 
> that includes the chunk we are about to return. That allows us to use 
> the same exit path - which removes the chunk from the freelist and 
> adjusts all counters - in the caller function 
> "ChunkManager::free_chunks_get" instead of having to return in the 
> middle of the function.
>
> To make the test more readable, I also remove the 
> "test-that-free-chunks-are-optimally-merged" verification - which was 
> quite lengthy - from VirtualSpaceNode::verify() to a new function, 
> VirtualSpaceNode::verify_free_chunks_are_ideally_merged().
>
>
>     You might want to keep the origin in product mode if it doesn't
>     add to the chunk footprint.   Might help with customer debugging.
>
>
> See above
>
>     Awesome looking test...
>
>
> Thanks, I was worried it would be too complicated.
> I changed it a bit because there were sporadic errors. Not a "real" 
> error, just the test itself was faulty. The "metaspaces_in_use" 
> counter was slightly wrong in one corner case.
>
>     I've read through most of this and thank you for adding this to at
>     least partially solve the fragmentation problem.  The irony is
>     that we templatized the Dictionary from CMS so that we could use
>     it for Metaspace and that has splitting and coalescing but it
>     seems this code makes more sense than adapting that code (if it's
>     even possible).
>
>
> Well, it helps other metadata use cases too, no.
>
>
>     Thank you for working on this.  I'll sponsor this for you.
>
>     Coleen
>
>
>
> Thanks again!
>
> I also updated my jdk-submit branch to include these latest changes; 
> tests are still runnning.
>
> Kind Regards, Thomas
>
>
>     On 2/26/18 9:20 AM, Thomas Stüfe wrote:
>
>         Hi all,
>
>         I know this patch is a bit larger, but may I please have
>         reviews and/or
>         other input?
>
>         Issue: https://bugs.openjdk.java.net/browse/JDK-8198423
>         <https://bugs.openjdk.java.net/browse/JDK-8198423>
>         Latest version:
>         http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/
>         <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/>
>
>         For those who followed the mail thread, this is the
>         incremental diff to the
>         last changes (included feedback Goetz gave me on- and off-list):
>         http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev-incr/webrev/
>         <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev-incr/webrev/>
>
>         Thank you!
>
>         Kind Regards, Thomas Stuefe
>
>
>
>         On Thu, Feb 8, 2018 at 12:58 PM, Thomas Stüfe
>         <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>>
>         wrote:
>
>             Hi,
>
>             We would like to contribute a patch developed at SAP which
>             has been live
>             in our VM for some time. It improves the metaspace chunk
>             allocation:
>             reduces fragmentation and raises the chance of reusing
>             free metaspace
>             chunks.
>
>             The patch:
>             http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc
>             <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalesc>
>             ation/2018-02-05--2/webrev/
>
>             In very short, this patch helps with a number of
>             pathological cases where
>             metaspace chunks are free but cannot be reused because
>             they are of the
>             wrong size. For example, the metaspace freelist could be
>             full of small
>             chunks, which would not be reusable if we need larger
>             chunks. So, we could
>             get metaspace OOMs even in situations where the metaspace
>             was far from
>             exhausted. Our patch adds the ability to split and merge
>             metaspace chunks
>             dynamically and thus remove the "size-lock-in" problem.
>
>             Note that there have been other attempts to get a grip on
>             this problem,
>             see e.g. "SpaceManager::get_small_chunks_and_allocate()".
>             But arguably
>             our patch attempts a more complete solution.
>
>             In 2016 I discussed the idea for this patch with some
>             folks off-list,
>             among them Jon Matsimutso. He then did advice me to create
>             a JEP. So I did:
>             [1]. However, meanwhile changes to the JEP process were
>             discussed [2], and
>             I am not sure anymore this patch needs even needs a JEP.
>             It may be
>             moderately complex and hence carries the risk inherent in
>             any patch, but
>             its effects would not be externally visible (if you
>             discount seeing fewer
>             metaspace OOMs). So, I'd prefer to handle this as a simple
>             RFE.
>
>             --
>
>             How this patch works:
>
>             1) When a class loader dies, its metaspace chunks are
>             freed and returned
>             to the freelist for reuse by the next class loader. With
>             the patch, upon
>             returning a chunk to the freelist, an attempt is made to
>             merge it with its
>             neighboring chunks - should they happen to be free too -
>             to form a larger
>             chunk. Which then is placed in the free list.
>
>             As a result, the freelist should be populated by larger
>             chunks at the
>             expense of smaller chunks. In other words, all free chunks
>             should always be
>             as "coalesced as possible".
>
>             2) When a class loader needs a new chunk and a chunk of
>             the requested size
>             cannot be found in the free list, before carving out a new
>             chunk from the
>             virtual space, we first check if there is a larger chunk
>             in the free list.
>             If there is, that larger chunk is chopped up into n
>             smaller chunks. One of
>             them is returned to the callers, the others are re-added
>             to the freelist.
>
>             (1) and (2) together have the effect of removing the
>             size-lock-in for
>             chunks. If fragmentation allows it, small chunks are
>             dynamically combined
>             to form larger chunks, and larger chunks are split on demand.
>
>             --
>
>             What this patch does not:
>
>             This is not a rewrite of the chunk allocator - most of the
>             mechanisms stay
>             intact. Specifically, chunk sizes remain unchanged, and so
>             do chunk
>             allocation processes (when do which class loaders get
>             handed which chunk
>             size). Almost everthing this patch does affects only
>             internal workings of
>             the ChunkManager.
>
>             Also note that I refrained from doing any cleanups, since
>             I wanted
>             reviewers to be able to gauge this patch without filtering
>             noise.
>             Unfortunately this patch adds some complexity. But there
>             are many future
>             opportunities for code cleanup and simplification, some of
>             which we already
>             discussed in existing RFEs ([3], [4]). All of them are out
>             of the scope for
>             this particular patch.
>
>             --
>
>             Details:
>
>             Before the patch, the following rules held:
>             - All chunk sizes are multiples of the smallest chunk size
>             ("specialized
>             chunks")
>             - All chunk sizes of larger chunks are also clean
>             multiples of the next
>             smaller chunk size (e.g. for class space, the ratio of
>             specialized/small/medium chunks is 1:2:32)
>             - All chunk start addresses are aligned to the smallest
>             chunk size (more
>             or less accidentally, see metaspace_reserve_alignment).
>             The patch makes the last rule explicit and more strict:
>             - All (non-humongous) chunk start addresses are now
>             aligned to their own
>             chunk size. So, e.g. medium chunks are allocated at
>             addresses which are a
>             multiple of medium chunk size. This rule is not extended
>             to humongous
>             chunks, whose start addresses continue to be aligned to
>             the smallest chunk
>             size.
>
>             The reason for this new alignment rule is that it makes it
>             cheap both to
>             find chunk predecessors of a chunk and to check which
>             chunks are free.
>
>             When a class loader dies and its chunk is returned to the
>             freelist, all we
>             have is its address. In order to merge it with its
>             neighbors to form a
>             larger chunk, we need to find those neighbors, including
>             those preceding
>             the returned chunk. Prior to this patch that was not easy
>             - one would have
>             to iterate chunks starting at the beginning of the
>             VirtualSpaceNode. But
>             due to the new alignment rule, we now know where the
>             prospective larger
>             chunk must start - at the next lower
>             larger-chunk-size-aligned boundary. We
>             also know that currently a smaller chunk must start there (*).
>
>             In order to check the free-ness of chunks quickly, each
>             VirtualSpaceNode
>             now keeps a bitmap which describes its occupancy. One bit
>             in this bitmap
>             corresponds to a range the size of the smallest chunk size
>             and starting at
>             an address aligned to the smallest chunk size. Because of
>             the alignment
>             rules above, such a range belongs to one single chunk. The
>             bit is 1 if the
>             associated chunk is in use by a class loader, 0 if it is free.
>
>             When we have calculated the address range a prospective
>             larger chunk would
>             span, we now need to check if all chunks in that range are
>             free. Only then
>             we can merge them. We do that by querying the bitmap. Note
>             that the most
>             common use case here is forming medium chunks from smaller
>             chunks. With the
>             new alignment rules, the bitmap portion covering a medium
>             chunk now always
>             happens to be 16- or 32bit in size and is 16- or 32bit
>             aligned, so reading
>             the bitmap in many cases becomes a simple 16- or 32bit load.
>
>             If the range is free, only then we need to iterate the
>             chunks in that
>             range: pull them from the freelist, combine them to one
>             new larger chunk,
>             re-add that one to the freelist.
>
>             (*) Humongous chunks make this a bit more complicated.
>             Since the new
>             alignment rule does not extend to them, a humongous chunk
>             could still
>             straddle the lower or upper boundary of the prospective
>             larger chunk. So I
>             gave the occupancy map a second layer, which is used to
>             mark the start of
>             chunks.
>             An alternative approach could have been to make humongous
>             chunks size and
>             start address always a multiple of the largest
>             non-humongous chunk size
>             (medium chunks). That would have caused a bit of waste per
>             humongous chunk
>             (<64K) in exchange for simpler coding and a simpler
>             occupancy map.
>
>             --
>
>             The patch shows its best results in scenarios where a lot
>             of smallish
>             class loaders are alive simultaneously. When dying, they
>             leave continuous
>             expanses of metaspace covered in small chunks, which can
>             be merged nicely.
>             However, if class loader life times vary more, we have
>             more interleaving of
>             dead and alive small chunks, and hence chunk merging does
>             not work as well
>             as it could.
>
>             For an example of a pathological case like this see
>             example program: [5]
>
>             Executed like this: "java -XX:CompressedClassSpaceSize=10M
>             -cp test3
>             test3.Example2" the test will load 3000 small classes in
>             separate class
>             loaders, then throw them away and start loading large
>             classes. The small
>             classes will have flooded the metaspace with small chunks,
>             which are
>             unusable for the large classes. When executing with the
>             rather limited
>             CompressedClassSpaceSize=10M, we will run into an OOM
>             after loading about
>             800 large classes, having used only 40% of the class
>             space, the rest is
>             wasted to unused small chunks. However, with our patch the
>             example program
>             will manage to allocate ~2900 large classes before running
>             into an OOM, and
>             class space will show almost no waste.
>
>             Do demonstrate this, add -Xlog:gc+metaspace+freelist.
>             After running into
>             an OOM, statistics and an ASCII representation of the
>             class space will be
>             shown. The unpatched version will show large expanses of
>             unused small
>             chunks, the patched variant will show almost no waste.
>
>             Note that the patch could be made more effective with a
>             different size
>             ratio between small and medium chunks: in class space,
>             that ratio is 1:16,
>             so 16 small chunks must happen to be free to form one
>             larger chunk. With a
>             smaller ratio the chance for coalescation would be larger.
>             So there may be
>             room for future improvement here: Since we now can merge
>             and split chunks
>             on demand, we could introduce more chunk sizes.
>             Potentially arriving at a
>             buddy-ish allocator style where we drop hard-wired chunk
>             sizes for a
>             dynamic model where the ratio between chunk sizes is
>             always 1:2 and we
>             could in theory have no limit to the chunk size? But this
>             is just a thought
>             and well out of the scope of this patch.
>
>             --
>
>             What does this patch cost (memory):
>
>               - the occupancy bitmap adds 1 byte per 4K metaspace.
>               - MetaChunk headers get larger, since we add an enum and
>             two bools to it.
>             Depending on what the c++ compiler does with that, chunk
>             headers grow by
>             one or two MetaWords, reducing the payload size by that
>             amount.
>             - The new alignment rules mean we may need to create
>             padding chunks to
>             precede larger chunks. But since these padding chunks are
>             added to the
>             freelist, they should be used up before the need for new
>             padding chunks
>             arises. So, the maximally possible number of unused
>             padding chunks should
>             be limited by design to about 64K.
>
>             The expectation is that the memory savings by this patch
>             far outweighs its
>             added memory costs.
>
>             .. (performance):
>
>             We did not see measurable drops in standard benchmarks
>             raising over the
>             normal noise. I also measured times for a program which
>             stresses metaspace
>             chunk coalescation, with the same result.
>
>             I am open to suggestions what else I should measure,
>             and/or independent
>             measurements.
>
>             --
>
>             Other details:
>
>             I removed SpaceManager::get_small_chunk_and_allocate() to
>             reduce
>             complexity somewhat, because it was made mostly obsolete
>             by this patch:
>             since small chunks are combined to larger chunks upon
>             return to the
>             freelist, in theory we should not have that many free
>             small chunks anymore
>             anyway. However, there may be still cases where we could
>             benefit from this
>             workaround, so I am asking your opinion on this one.
>
>             About tests: There were two native tests -
>             ChunkManagerReturnTest and
>             TestVirtualSpaceNode (the former was added by me last
>             year) - which did not
>             make much sense anymore, since they relied heavily on
>             internal behavior
>             which was made unpredictable with this patch.
>             To make up for these lost tests,  I added a new gtest
>             which attempts to
>             stress the many combinations of allocation pattern but
>             does so from a layer
>             above the old tests. It now uses Metaspace::allocate() and
>             friends. By
>             using that point as entry for tests, I am less dependent
>             on implementation
>             internals and still cover a lot of scenarios.
>
>             --
>
>             Review pointers:
>
>             Good points to start are
>             - ChunkManager::return_single_chunk() - specifically,
>             ChunkManager::attempt_to_coalesce_around_chunk() - here we
>             merge chunks
>             upon return to the free list
>             - ChunkManager::free_chunks_get(): Here we now split large
>             chunks into
>             smaller chunks on demand
>             - VirtualSpaceNode::take_from_committed() : chunks are
>             allocated
>             according to align rules now, padding chunks are handles
>             - The OccupancyMap class is the helper class implementing
>             the new
>             occupancy bitmap
>
>             The rest is mostly chaff: helper functions, added tests
>             and verifications.
>
>             --
>
>             Thanks and Best Regards, Thomas
>
>             [1] https://bugs.openjdk.java.net/browse/JDK-8166690
>             <https://bugs.openjdk.java.net/browse/JDK-8166690>
>             [2]
>             http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November
>             <http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November>
>             /000128.html
>             [3] https://bugs.openjdk.java.net/browse/JDK-8185034
>             <https://bugs.openjdk.java.net/browse/JDK-8185034>
>             [4] https://bugs.openjdk.java.net/browse/JDK-8176808
>             <https://bugs.openjdk.java.net/browse/JDK-8176808>
>             [5]
>             https://bugs.openjdk.java.net/secure/attachment/63532/test3.zip
>             <https://bugs.openjdk.java.net/secure/attachment/63532/test3.zip>
>
>
>
>
>