RFR(L): 8198423: Improve metaspace chunk allocation (was: Proposal for improvements to the metaspace chunk allocator)
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Tue Mar 6 18:00:05 UTC 2018
On 3/6/18 1:10 AM, Thomas Stüfe wrote:
> Hi Coleen,
>
> We test nightly in windows 32bit. I'll go and run some tests on 32bit
> linux too.
>
> Thanks for the sponsoring offer! Goetz already reviewed this patch,
> would that be sufficient or should I look for another reviewer from
> Oracle?
That's sufficient. I'll sponsor the patch for you. Can you update the
patch in the webrev?
thanks,
Coleen
>
> Kind Regards, Thomas
>
>
> On Tue, Mar 6, 2018 at 12:59 AM, <coleen.phillimore at oracle.com
> <mailto:coleen.phillimore at oracle.com>> wrote:
>
>
> Hi Thomas,
>
> I've read through the new code. I don't have any substantive
> comments. Thank you for adding the functions.
>
> Has this been tested on any 32 bit platforms? I will sponsor
> this when you have another reviewer.
>
> Thanks for taking on the metaspace!
>
> Coleen
>
>
> On 3/1/18 5:36 AM, Thomas Stüfe wrote:
>> Hi Coleen,
>>
>> thanks a lot for the review and the sponsoring offer!
>>
>> New version (full):
>> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-full/webrev/
>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-full/webrev/>
>> incremental:
>> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-incr/webrev/
>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-incr/webrev/>
>>
>> Please find remarks inline:
>>
>>
>> On Tue, Feb 27, 2018 at 11:22 PM, <coleen.phillimore at oracle.com
>> <mailto:coleen.phillimore at oracle.com>> wrote:
>>
>>
>> Thomas, review comments:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metachunk.hpp.udiff.html
>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metachunk.hpp.udiff.html>
>>
>> +// ChunkIndex (todo: rename?) defines the type of chunk.
>> Chunk types
>>
>>
>> It's really both, isn't it? The type is the index into the
>> free list or in use lists. The name seems fine.
>>
>>
>> You are right. What I meant was that a lot of code needs to know
>> about the different chunk sizes, but naming it "Index" and adding
>> enum values like "NumberOfFreeLists" we expose implementation
>> details no-one outside of SpaceManager and ChunkManager cares
>> about (namely, the fact that these values are internally used as
>> indices into arrays). A more neutral naming would be something
>> like "enum ChunkTypes { spec,small, .... ,
>> NumberOfNonHumongousChunkTypes, NumberOfChunkTypes }.
>>
>> However, I can leave this out for a possible future cleanup. The
>> change is big enough as it is.
>>
>> Can you add comments on the #endifs if the #ifdef is more
>> than a couple 2-3 lines above (it's a nit that bothers me).
>>
>> +#ifdef ASSERT
>> + // A 32bit sentinel for debugging purposes.
>> +#define CHUNK_SENTINEL 0x4d4554EF // "MET"
>> +#define CHUNK_SENTINEL_INVALID 0xFEEEEEEF
>> + uint32_t _sentinel;
>> +#endif
>> + const ChunkIndex _chunk_type;
>> + const bool _is_class;
>> + // Whether the chunk is free (in freelist) or in use by
>> some class loader.
>> bool _is_tagged_free;
>> +#ifdef ASSERT
>> + ChunkOrigin _origin;
>> + int _use_count;
>> +#endif
>> +
>>
>>
>> I removed the asserts completely, following your suggestion below
>> that "origin" would be valuable in customer scenarios too. By
>> that logic, the other members are valuable too: the sentinel is
>> valuable when examining memory dumps to see the start of chunks,
>> and the in-use counter is useful too. What do you think?
>>
>> So, I leave the members in - which, depending what the C++
>> compiler does to enums and bools, may cost up to 128bit
>> additional header space. I think that is ok. In one of my earlier
>> versions of this patch I hand-crafted the header using chars and
>> bitfields to be as small as possible, but that seemed
>> over-engineered.
>>
>> However, I left out any automatic verifications accessing these
>> debug members. These are still only done in debug builds.
>>
>>
>> It seems that if you could move origin and _use_count into
>> the ASSERT block above (maybe putting use_count before _origin.
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metaspace.cpp.udiff.html
>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metaspace.cpp.udiff.html>
>>
>> In take_from_committed, can the allocation of padding chunks
>> be its own function like add_chunks_to_aligment() lines
>> 1574-1615? The function is too long now.
>>
>>
>> I moved the padding chunk allocation into an own function as you
>> suggested.
>>
>> I don't think coalescation is a word in English, at least my
>> dictionary cannot find it. Although it makes sense in the
>> context, just distracting.
>>
>>
>> I replaced "coalescation" with "chunk merging" throughout the
>> code. Also less of a tongue breaker.
>>
>> + // Now check if in the coalescation area there are still
>> life chunks.
>>
>>
>> "live" chunks I guess. A sentence you won't read often :).
>>
>>
>> Now that I read it it almost sounded sinister :) Fixed.
>>
>>
>> In free_chunks_get() can you handle the Humongous case first?
>> The else for humongous chunk size is buried tons of lines below.
>>
>> Otherwise it might be helpful to the logic to make your
>> addition to this function be a function you call like
>> chunk = split_from_larger_free_chunk();
>>
>>
>> I did the latter. I moved the splitting of a larger chunk to an
>> own function. This causes a slight logic change: the new function
>> (ChunkManager::split_chunk()) splits an existing large free
>> chunks into n smaller free chunks and adds them all back to the
>> freelist - that includes the chunk we are about to return. That
>> allows us to use the same exit path - which removes the chunk
>> from the freelist and adjusts all counters - in the caller
>> function "ChunkManager::free_chunks_get" instead of having to
>> return in the middle of the function.
>>
>> To make the test more readable, I also remove the
>> "test-that-free-chunks-are-optimally-merged" verification - which
>> was quite lengthy - from VirtualSpaceNode::verify() to a new
>> function, VirtualSpaceNode::verify_free_chunks_are_ideally_merged().
>>
>>
>> You might want to keep the origin in product mode if it
>> doesn't add to the chunk footprint. Might help with
>> customer debugging.
>>
>>
>> See above
>>
>> Awesome looking test...
>>
>>
>> Thanks, I was worried it would be too complicated.
>> I changed it a bit because there were sporadic errors. Not a
>> "real" error, just the test itself was faulty. The
>> "metaspaces_in_use" counter was slightly wrong in one corner case.
>>
>> I've read through most of this and thank you for adding this
>> to at least partially solve the fragmentation problem. The
>> irony is that we templatized the Dictionary from CMS so that
>> we could use it for Metaspace and that has splitting and
>> coalescing but it seems this code makes more sense than
>> adapting that code (if it's even possible).
>>
>>
>> Well, it helps other metadata use cases too, no.
>>
>>
>> Thank you for working on this. I'll sponsor this for you.
>>
>> Coleen
>>
>>
>>
>> Thanks again!
>>
>> I also updated my jdk-submit branch to include these latest
>> changes; tests are still runnning.
>>
>> Kind Regards, Thomas
>>
>>
>> On 2/26/18 9:20 AM, Thomas Stüfe wrote:
>>
>> Hi all,
>>
>> I know this patch is a bit larger, but may I please have
>> reviews and/or
>> other input?
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8198423
>> <https://bugs.openjdk.java.net/browse/JDK-8198423>
>> Latest version:
>> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/
>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/>
>>
>> For those who followed the mail thread, this is the
>> incremental diff to the
>> last changes (included feedback Goetz gave me on- and
>> off-list):
>> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev-incr/webrev/
>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev-incr/webrev/>
>>
>> Thank you!
>>
>> Kind Regards, Thomas Stuefe
>>
>>
>>
>> On Thu, Feb 8, 2018 at 12:58 PM, Thomas Stüfe
>> <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>>
>> wrote:
>>
>> Hi,
>>
>> We would like to contribute a patch developed at SAP
>> which has been live
>> in our VM for some time. It improves the metaspace
>> chunk allocation:
>> reduces fragmentation and raises the chance of
>> reusing free metaspace
>> chunks.
>>
>> The patch:
>> http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc
>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalesc>
>> ation/2018-02-05--2/webrev/
>>
>> In very short, this patch helps with a number of
>> pathological cases where
>> metaspace chunks are free but cannot be reused
>> because they are of the
>> wrong size. For example, the metaspace freelist could
>> be full of small
>> chunks, which would not be reusable if we need larger
>> chunks. So, we could
>> get metaspace OOMs even in situations where the
>> metaspace was far from
>> exhausted. Our patch adds the ability to split and
>> merge metaspace chunks
>> dynamically and thus remove the "size-lock-in" problem.
>>
>> Note that there have been other attempts to get a
>> grip on this problem,
>> see e.g.
>> "SpaceManager::get_small_chunks_and_allocate()". But
>> arguably
>> our patch attempts a more complete solution.
>>
>> In 2016 I discussed the idea for this patch with some
>> folks off-list,
>> among them Jon Matsimutso. He then did advice me to
>> create a JEP. So I did:
>> [1]. However, meanwhile changes to the JEP process
>> were discussed [2], and
>> I am not sure anymore this patch needs even needs a
>> JEP. It may be
>> moderately complex and hence carries the risk
>> inherent in any patch, but
>> its effects would not be externally visible (if you
>> discount seeing fewer
>> metaspace OOMs). So, I'd prefer to handle this as a
>> simple RFE.
>>
>> --
>>
>> How this patch works:
>>
>> 1) When a class loader dies, its metaspace chunks are
>> freed and returned
>> to the freelist for reuse by the next class loader.
>> With the patch, upon
>> returning a chunk to the freelist, an attempt is made
>> to merge it with its
>> neighboring chunks - should they happen to be free
>> too - to form a larger
>> chunk. Which then is placed in the free list.
>>
>> As a result, the freelist should be populated by
>> larger chunks at the
>> expense of smaller chunks. In other words, all free
>> chunks should always be
>> as "coalesced as possible".
>>
>> 2) When a class loader needs a new chunk and a chunk
>> of the requested size
>> cannot be found in the free list, before carving out
>> a new chunk from the
>> virtual space, we first check if there is a larger
>> chunk in the free list.
>> If there is, that larger chunk is chopped up into n
>> smaller chunks. One of
>> them is returned to the callers, the others are
>> re-added to the freelist.
>>
>> (1) and (2) together have the effect of removing the
>> size-lock-in for
>> chunks. If fragmentation allows it, small chunks are
>> dynamically combined
>> to form larger chunks, and larger chunks are split on
>> demand.
>>
>> --
>>
>> What this patch does not:
>>
>> This is not a rewrite of the chunk allocator - most
>> of the mechanisms stay
>> intact. Specifically, chunk sizes remain unchanged,
>> and so do chunk
>> allocation processes (when do which class loaders get
>> handed which chunk
>> size). Almost everthing this patch does affects only
>> internal workings of
>> the ChunkManager.
>>
>> Also note that I refrained from doing any cleanups,
>> since I wanted
>> reviewers to be able to gauge this patch without
>> filtering noise.
>> Unfortunately this patch adds some complexity. But
>> there are many future
>> opportunities for code cleanup and simplification,
>> some of which we already
>> discussed in existing RFEs ([3], [4]). All of them
>> are out of the scope for
>> this particular patch.
>>
>> --
>>
>> Details:
>>
>> Before the patch, the following rules held:
>> - All chunk sizes are multiples of the smallest chunk
>> size ("specialized
>> chunks")
>> - All chunk sizes of larger chunks are also clean
>> multiples of the next
>> smaller chunk size (e.g. for class space, the ratio of
>> specialized/small/medium chunks is 1:2:32)
>> - All chunk start addresses are aligned to the
>> smallest chunk size (more
>> or less accidentally, see metaspace_reserve_alignment).
>> The patch makes the last rule explicit and more strict:
>> - All (non-humongous) chunk start addresses are now
>> aligned to their own
>> chunk size. So, e.g. medium chunks are allocated at
>> addresses which are a
>> multiple of medium chunk size. This rule is not
>> extended to humongous
>> chunks, whose start addresses continue to be aligned
>> to the smallest chunk
>> size.
>>
>> The reason for this new alignment rule is that it
>> makes it cheap both to
>> find chunk predecessors of a chunk and to check which
>> chunks are free.
>>
>> When a class loader dies and its chunk is returned to
>> the freelist, all we
>> have is its address. In order to merge it with its
>> neighbors to form a
>> larger chunk, we need to find those neighbors,
>> including those preceding
>> the returned chunk. Prior to this patch that was not
>> easy - one would have
>> to iterate chunks starting at the beginning of the
>> VirtualSpaceNode. But
>> due to the new alignment rule, we now know where the
>> prospective larger
>> chunk must start - at the next lower
>> larger-chunk-size-aligned boundary. We
>> also know that currently a smaller chunk must start
>> there (*).
>>
>> In order to check the free-ness of chunks quickly,
>> each VirtualSpaceNode
>> now keeps a bitmap which describes its occupancy. One
>> bit in this bitmap
>> corresponds to a range the size of the smallest chunk
>> size and starting at
>> an address aligned to the smallest chunk size.
>> Because of the alignment
>> rules above, such a range belongs to one single
>> chunk. The bit is 1 if the
>> associated chunk is in use by a class loader, 0 if it
>> is free.
>>
>> When we have calculated the address range a
>> prospective larger chunk would
>> span, we now need to check if all chunks in that
>> range are free. Only then
>> we can merge them. We do that by querying the bitmap.
>> Note that the most
>> common use case here is forming medium chunks from
>> smaller chunks. With the
>> new alignment rules, the bitmap portion covering a
>> medium chunk now always
>> happens to be 16- or 32bit in size and is 16- or
>> 32bit aligned, so reading
>> the bitmap in many cases becomes a simple 16- or
>> 32bit load.
>>
>> If the range is free, only then we need to iterate
>> the chunks in that
>> range: pull them from the freelist, combine them to
>> one new larger chunk,
>> re-add that one to the freelist.
>>
>> (*) Humongous chunks make this a bit more
>> complicated. Since the new
>> alignment rule does not extend to them, a humongous
>> chunk could still
>> straddle the lower or upper boundary of the
>> prospective larger chunk. So I
>> gave the occupancy map a second layer, which is used
>> to mark the start of
>> chunks.
>> An alternative approach could have been to make
>> humongous chunks size and
>> start address always a multiple of the largest
>> non-humongous chunk size
>> (medium chunks). That would have caused a bit of
>> waste per humongous chunk
>> (<64K) in exchange for simpler coding and a simpler
>> occupancy map.
>>
>> --
>>
>> The patch shows its best results in scenarios where a
>> lot of smallish
>> class loaders are alive simultaneously. When dying,
>> they leave continuous
>> expanses of metaspace covered in small chunks, which
>> can be merged nicely.
>> However, if class loader life times vary more, we
>> have more interleaving of
>> dead and alive small chunks, and hence chunk merging
>> does not work as well
>> as it could.
>>
>> For an example of a pathological case like this see
>> example program: [5]
>>
>> Executed like this: "java
>> -XX:CompressedClassSpaceSize=10M -cp test3
>> test3.Example2" the test will load 3000 small classes
>> in separate class
>> loaders, then throw them away and start loading large
>> classes. The small
>> classes will have flooded the metaspace with small
>> chunks, which are
>> unusable for the large classes. When executing with
>> the rather limited
>> CompressedClassSpaceSize=10M, we will run into an OOM
>> after loading about
>> 800 large classes, having used only 40% of the class
>> space, the rest is
>> wasted to unused small chunks. However, with our
>> patch the example program
>> will manage to allocate ~2900 large classes before
>> running into an OOM, and
>> class space will show almost no waste.
>>
>> Do demonstrate this, add -Xlog:gc+metaspace+freelist.
>> After running into
>> an OOM, statistics and an ASCII representation of the
>> class space will be
>> shown. The unpatched version will show large expanses
>> of unused small
>> chunks, the patched variant will show almost no waste.
>>
>> Note that the patch could be made more effective with
>> a different size
>> ratio between small and medium chunks: in class
>> space, that ratio is 1:16,
>> so 16 small chunks must happen to be free to form one
>> larger chunk. With a
>> smaller ratio the chance for coalescation would be
>> larger. So there may be
>> room for future improvement here: Since we now can
>> merge and split chunks
>> on demand, we could introduce more chunk sizes.
>> Potentially arriving at a
>> buddy-ish allocator style where we drop hard-wired
>> chunk sizes for a
>> dynamic model where the ratio between chunk sizes is
>> always 1:2 and we
>> could in theory have no limit to the chunk size? But
>> this is just a thought
>> and well out of the scope of this patch.
>>
>> --
>>
>> What does this patch cost (memory):
>>
>> - the occupancy bitmap adds 1 byte per 4K metaspace.
>> - MetaChunk headers get larger, since we add an
>> enum and two bools to it.
>> Depending on what the c++ compiler does with that,
>> chunk headers grow by
>> one or two MetaWords, reducing the payload size by
>> that amount.
>> - The new alignment rules mean we may need to create
>> padding chunks to
>> precede larger chunks. But since these padding chunks
>> are added to the
>> freelist, they should be used up before the need for
>> new padding chunks
>> arises. So, the maximally possible number of unused
>> padding chunks should
>> be limited by design to about 64K.
>>
>> The expectation is that the memory savings by this
>> patch far outweighs its
>> added memory costs.
>>
>> .. (performance):
>>
>> We did not see measurable drops in standard
>> benchmarks raising over the
>> normal noise. I also measured times for a program
>> which stresses metaspace
>> chunk coalescation, with the same result.
>>
>> I am open to suggestions what else I should measure,
>> and/or independent
>> measurements.
>>
>> --
>>
>> Other details:
>>
>> I removed
>> SpaceManager::get_small_chunk_and_allocate() to reduce
>> complexity somewhat, because it was made mostly
>> obsolete by this patch:
>> since small chunks are combined to larger chunks upon
>> return to the
>> freelist, in theory we should not have that many free
>> small chunks anymore
>> anyway. However, there may be still cases where we
>> could benefit from this
>> workaround, so I am asking your opinion on this one.
>>
>> About tests: There were two native tests -
>> ChunkManagerReturnTest and
>> TestVirtualSpaceNode (the former was added by me last
>> year) - which did not
>> make much sense anymore, since they relied heavily on
>> internal behavior
>> which was made unpredictable with this patch.
>> To make up for these lost tests, I added a new gtest
>> which attempts to
>> stress the many combinations of allocation pattern
>> but does so from a layer
>> above the old tests. It now uses
>> Metaspace::allocate() and friends. By
>> using that point as entry for tests, I am less
>> dependent on implementation
>> internals and still cover a lot of scenarios.
>>
>> --
>>
>> Review pointers:
>>
>> Good points to start are
>> - ChunkManager::return_single_chunk() - specifically,
>> ChunkManager::attempt_to_coalesce_around_chunk() -
>> here we merge chunks
>> upon return to the free list
>> - ChunkManager::free_chunks_get(): Here we now split
>> large chunks into
>> smaller chunks on demand
>> - VirtualSpaceNode::take_from_committed() : chunks
>> are allocated
>> according to align rules now, padding chunks are handles
>> - The OccupancyMap class is the helper class
>> implementing the new
>> occupancy bitmap
>>
>> The rest is mostly chaff: helper functions, added
>> tests and verifications.
>>
>> --
>>
>> Thanks and Best Regards, Thomas
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8166690
>> <https://bugs.openjdk.java.net/browse/JDK-8166690>
>> [2]
>> http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November
>> <http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November>
>> /000128.html
>> [3] https://bugs.openjdk.java.net/browse/JDK-8185034
>> <https://bugs.openjdk.java.net/browse/JDK-8185034>
>> [4] https://bugs.openjdk.java.net/browse/JDK-8176808
>> <https://bugs.openjdk.java.net/browse/JDK-8176808>
>> [5]
>> https://bugs.openjdk.java.net/secure/attachment/63532/test3.zip
>> <https://bugs.openjdk.java.net/secure/attachment/63532/test3.zip>
>>
>>
>>
>>
>>
>
>
More information about the hotspot-dev
mailing list