RFR: 8251158: Implementation of JEP 387: Elastic Metaspace

Sat Aug 22 13:40:19 UTC 2020

Hi Albert,

On Sat, Aug 22, 2020 at 10:47 AM Albert Yang <albert.m.yang at oracle.com>
wrote:

> Hi Thomas,
>
> Thank you for preparing the review guide, which has been quite helpful
> to see the big picture, and in providing a starting point to embrace
> this large patch/code base.
>
> ## Metachunk exposing committed memory status
>
> One thing I find odd is that chunk doesn't encapsulate committed memory
> status. In other words, a component interacting with a chuck needs to be

aware of how much memory is committed. I will provide two examples in
> code, one is the "consumer" of `Metachunk`, the other the "producer" of
> `Metachunk`.
>
> For the "consumer" case, `MetaspaceArena::allocate`, allocation is
> broken into two steps, 1. for committed memory, 2. for virtual memory
> (+committed memory from step 1).
>
> ```
> if (!current_chunk_too_small) {
>      if (!current_chunk()->ensure_committed_additional(raw_word_size)) {
>      UL2(info, "commit failure (requested size: " SIZE_FORMAT ")",
> raw_word_size);
>      commit_failure = true;
>      }
> }
>
> // Allocate from the current chunk. This should work now.
> if (!current_chunk_too_small && !commit_failure) {
>      p = current_chunk()->allocate(raw_word_size);
>      assert(p != NULL, "Allocation from chunk failed.");
> }
> ```
>
> If the "committed memory status" is properly encapsulated in
> `Metachunk::allocate`, the above code could be like:
>
> ```
> if (!current_chunk_too_small) {
>    p = current_chunk()->allocate(raw_word_size);
> }
> ```
>
> For the "producer" case, `ChunkManager::get_chunk` mostly deals with
> identifying a chunk which satisfied some constraints, but the following
> code focuses on committed bytes, which, in my opinion, is on another
> abstraction level. I believe that moving such logic to
> `Metachunk::allocate` could make code more readable.
>
> ```
> // Attempt to commit the chunk (depending on settings, we either fully
> commit it or just
> //  commit enough to get the caller going). That may fail if we hit a
> commit limit. In
> //  that case put the chunk back to the freelist (re-merging it with its
> neighbors if we
> //  did split it) and return NULL.
> const size_t to_commit = Settings::new_chunks_are_fully_committed() ?
> c->word_size() : min_committed_words;
> if (c->committed_words() < to_commit) {
>      if (c->ensure_committed_locked(to_commit) == false) {
>      UL2(info, "failed to commit " SIZE_FORMAT " words on chunk "
> METACHUNK_FORMAT ".",
>          to_commit,  METACHUNK_FORMAT_ARGS(c));
>      c->set_in_use(); // gets asserted in return_chunk().
>      return_chunk_locked(c);
>      c = NULL;
>      }
> }
> ```
>
>
These are two questions (both good ones):

1) why does Metachunk::allocate not commit its own memory
2) why does ChunkManager::get_chunk() concern itself with committing the
returned chunk

(1) this is a matter of taste, rather unimportant. I played around with
more "intelligent" Metachunk but found it easier to understand if the chunk
was left dumb - not to take any action on its own. Note that there are
cases where we allocate from a chunk but do not want to auto-commit further
memory, see MetaspaceArena::salvage_chunk(). Not a strong reason, just a
matter of taste.

(2): Earlier prototypes of ChunkManager::get_chunk() did not concern itself
with committing and did not make any promises toward the commit state of
the returned chunk. So, MetaspaceArena::allocate had to take care to commit
the chunk they got before allocating.

Two problems though. One is a small performance issue. To commit memory we
will need the global metaspace expand lock. In ChunkManager::get_chunk() we
already have that lock. Since the caller, MetaspaceArena::allocate(), would
invariably commit the start of the chunk anyway we might just as well
commit it right now, in ChunkManager::get_chunk(). Since we are here, and
have the lock..

The second problem is more subtle but also more important. Error handling
is intricate.

Committing memory may fail, we may hit a limit (gc threshold or
MaxMetaspaceSize). In that case MetaspaceArena would be left with a fully
uncommitted useless current chunk. What to do with this chunk?

Keeping it as current chunk in the Arena is no
option. MetaspaceArena::allocate() will now return NULL to signal
allocation failure. Let's say we hit MayMetaspaceSize. Caller will attempt
a GC. Maybe succeed. Maybe free some loaders. Maybe that causes the
ChunkManager freelists to be repopulated with free committed chunks. But
maybe not enough free chunks to actually uncommit something (none of the
returned chuks being >= commit granule size). So we still have no
commit-quota left to commit more memory. What we now do have is free
committed chunks in the freelists, which we could use. So, caller calls
MetaspaceArena::allocate() again. It still has its current uncommitted
chunk. It would, again, try to commit that chunk. Fail again. Return NULL
again. Wrongly so, since the ChunkManager free lists are populated with
free committed chunks now which we could have used had we ignored the
current chunk. So we get an OOM even though the GC beforehand did clean
some loaders successfully.

So, Arena needs to let go of the uncommitted chunk after a commit failure.
Okay, easy enough. We could salvage that thing - add it to the list of
retired chunks in MetaspaceArena. But since this "retry allocation"
business is done repeatedly (see MetaspaceGC) we would, in this corner
case, needlessly accumulate a number of completely uncommitted retired
chunks.

Last option, failing to commit this new chunk we could return it to the
ChunkManager. And attempt to get a new one, one of those which had just
been added after the GC. This is basically what happens now, only that this
is completely handled inside of ChunkManager::get_chunk().

Bottom line: error handling is just simpler and cleaner if
ChunkManager::get_chunk() never hands out a fully uncommitted chunk in the
first place, and if we make it ChunkManager's business to ensure the
returned chunk is at least usable to satisfy the allocation which caused
its creation.

> ## The motivation for using a buddy-allocator
>
> It seems that `MetaspaceArena::deallocate()` never releases the memory
> to OS (memory is still committed); instead, it maintains those memory in
> a free list for future allocation. In other words, memory is never
> uncommitted until `~MetaspaceArena()`. Then, it's not obvious to me why
> a buddy-allocator is used. I was think a lazily committed would be
> enough, something like the following. Maybe I overlooked some
> constraints in the original problem spec. Any clarification/elaboration
> is appreciated, and they should probably be documented in the
> corresponding file(s).
>
> ```
> struct root_chuck {
>    word* _end;
>    word* _committed_end;
>    word* _top;
>
>    word* alloc(size_t s_in_words) {
>      if (_top + s_in_words > _end) {
>        // no space for this chuck
>        return nullptr;
>      }
>      if (_top + s_in_words > _committed_end) {
>        if (!commit(_committed_end, round_up(s_in_words, 64K))) {
>          // can't commit
>          return nullptr;
>        }
>        // update commit end
>        _committed_end += ...
>      }
>      // success path
>      auto ret = _top;
>      _top += s_in_words;
>      return ret;
>    }
> };
> ```
>

I am not sure I understand your question. The buddy allocator has nothing
to do with the deallocation free list, and both have not much to do with
uncommitting.

Deallocation is a rare path taken when we prematurely deallocate metadata,
e.g. on class redefinition or if we have class loading errors. The standard
way to release metaspace is by a loader dying. The story behind
deallocate() is basically: In 99.9% of all cases we have an allocation
pattern very well suited for arena allocators - bulk free and so on - but
in rare cases this story breaks apart. E.g. when facing an insane level of
class redefinitions. For those pathological non-bulk-delete cases the
allocator shall be good enough to not make the VM explode.

Deallocation happens, like allocation, on block level (arbitrarily sized
allocation units of 8...~512K bytes), with 99% of those allocations being
in the single-to-dozen-word-range. The buddy allocator works on chunk
level, typically much coarser than blocks, 1K...4M range.

Or do you, perhaps, ask why we do not just allocate uniformly-sized chunks,
give those to the Arenas and let them be committed on demand? That way we
would not need a buddy allocator since all chunks would have the same size.

This is what I think Leo asked in the call. The question is valid. The
answer:

To have a truly uniform chunk size it would have to be large enough for
every conceivable allocation. So, a root chunk. If we give each loader a
root chunk and let those chunks lazily commit themselves, we have the
following problems:

- We'd run up a ridiculous process vsize. Let's say you have 10000 class
loaders (not unreasonable, and far from the worst I have ever seen). You'd
use up about 80GB of address space for metaspace alone (with compressed
class space active). Plain impossible on 32bit (okay, we would have no ccs
but we would still need 40G). On 64bit, on most platforms vsize may not
matter that much but still cause comments.

- You would of course very quickly run against CompressedClassSpaceSize -
one place where reserved size actually matters. Lets say we run with the
default 1G of ccs. That gives us space enough for 256 root chunks. Assuming
no arena ever wants a second chunk for class space allocation, that would
mean we cannot have more than 256 loaders/CLDs.

- Most importantly each arena would use up a complete commit granule. By
default 64K. Way too much. Even if we shrink granule size to the lowest
possible size, one 4K page (lets ignore 64K paged platforms): we would
still use up 4K per loader. Still too much. As can be seen by the valiant
attempt of RedHat when they tuned Metaspace usage for wildfly swarms: we
have small loaders which only load a handful of classes or sometimes only
one class; these were the reason the original Metaspace has those
"specialized chunks" of 1K.

In other words, chunks can be smaller than a commit granule and that makes
sense since for small loaders we want to share a granule among multiple
loaders.

Finally, if you then say "okay, then let's introduce multiple chunks, maybe
really small ones for small loaders" you are down the rabbit hole of the
original Metaspace. The moment you have multiple chunk sizes you need a way
to merge / split them etc. Witness the awkward way we do this now
(Occupancy-Bitmap) in current Metaspace. The moment you have multiple chunk
sizes the proposed buddy allocator is just the simplest implementation.

---

>
> ## minor comments
>
> `MetaspaceArena::retire_current_chunk()` is referenced in the some
> places, but there's no such method. Maybe stale comments.
>
>
Good catch.

>
> Better to use more informative names than `_chunks`, e.g.
> `_allocated_chunks` vs `_free_chunks`, since they are quite different.
>
>
Matter of taste. I disagree, since the "allocated" is implicit for an
arena. I would only adorn the member names with adjectives if there were
other members in the same class I have to differentiate from.

> In `MetaspaceArena`,
>
> ```
> // List of chunks. Head of the list is the current chunk.
> MetachunkList _chunks;
> ```
> and `ChunkManager`
> ```
> // Freelists
> FreeChunkListVector _chunks;
> ```
> --
> /Albert
>

Thanks Albert. Hope I could answer your questions.

Cheers, Thomas