RFR: 8251158: Implementation of JEP 387: Elastic Metaspace

Wed Aug 26 06:31:19 UTC 2020

Hi Leo,

On Tue, Aug 25, 2020 at 4:29 PM Leo Korinth <leo.korinth at oracle.com> wrote:

> Hi, review of: freeBlocks.?pp, blockTree.?pp and binList.hpp below
>
> I have some comments in addition to those from Coleen:
>
> ### freeBlocks.hpp:
>
> 75
> 76   static const size_t splinter_threshold = 0;// 0x100;
> 77
>
> splinter_threshold with the strange comment // 0X100, what is it
> supposed to be used for?  (always 0, but with a comment 0x100;).

The splinter threshold was supposed to aid in the decision of whether it is
worth the trouble preserving the remainder of a block. Lets say someone
needs 100 words, I only find a 120 word block in the tree. I hand this to
the caller, but do I chop off the remaining 20 blocks and re-store them or
do I waste 20 blocks? This is handled in FreeBlocks::get_block().

After many tests, it turned out to be just better to always chop off the
remaining space instead of wasting it. Especially since if it is small
enough it falls under the responsibility of the BinList, which has O(1) for
insert. That is also the reason why a single threshold value for both tree
and binlist is not a good fit anyway.

Therefore last week I removed the threshold value. Makes the coding also
easier to read.

> Maybe
> related to this, is that I have problems to understand how allocation
> guards are handled when memory is put back to the freelist, I can not
> see the pointer being decremented (the inverse operation of
> payload_from_prefix, prefix_from_payload seems never to be used, and I
> fail to find the operation done manually either.
>

Not related to that, but you are correct. Switching on include guards
switches off deallocation, see settings.cpp:109ff. So, FreeBlocks does not
get used with guards. Which means we lose deallocated blocks, but the
guards are a debug feature only and in most scenarios deallocation plays a
small role.

I could make guards work with deallocation too; I did not do so since which
would have made the freeblock handling more complicated. Since I was not
sure how the guards would be received by Reviewers, and since they are a
non-essential feature, I wanted to wait for feedback before putting in
further work.

(Note that despite this shortcoming the guards work just fine).

About prefix_from_payload - you are right, that can be removed.

>
> 68   typedef BinList32 SmallBlocksType;
> 69
> 70   // _small_blocks takes care of small to very small blocks.
> 71   SmallBlocksType _small_blocks;
>
>
> I think all of BinListXX typedefs can be removed and also the
> SmallBlocksType typedef. The SmallBlocksType is only used once if we
> rewrite
>
> const static size_t minimal_word_size = SmallBlocksType::minimal_word_size;
> to
> const static size_t minimal_word_size = _small_blocks.minimal_word_size;
>
>
This would not compile:
msFreeBlocks.hpp:79:43: error: invalid use of non-static data member
'metaspace::FreeBlocks::_small_blocks'

> (and also change the name of minimal_word_size so that we can see it
> is a constant)
>
>
How?

> 84
> 85   // Retrieve a block of at least requested_word_size.
> 86   MetaWord* get_block(size_t requested_word_size);
> 87
>
> The function named get_block, this is not a getter, this mutates the
> object. I would prefer remove_block (goes well with add_block), and I
> would like that rename to propagate to both blockTree and binList.
>
>
My only misgiving would be that remove_block evokes a function which takes
a block reference and removes it, not one that finds a block and
removes it. But any alternative I can come up with ("find_and_remove_block"
e.g) is too much of a mouthful.

Okay, remove_block it is.

> ###  binList.hpp:
>
> Maybe another name for binList, binLookup for example?  When I hear
> list I am thinking /linked/ list and that might just be my problem,
> sorry if that is that case, and then ignore this point.
>
>
I'd rather keep that name. "BinList" is very roughly inspired by the fast
bins in glibc heap. It is also a list, since these are threaded blocks.

Strictly speaking it would be a BinListVector - a vector of list heads. If
your heart is set on a name change, I'd use that one.

> 76
> 77 template <size_t smallest_size, int num_lists>
> 78 class BinListImpl {
> 79
>
> Template argument smallest_size would be nice if the unit of the size
> (words) is mentioned in the name. It is so easy to mix sizes up.
>

Agreed.

>
> 82   // a mask to speed up searching for populated lists.
> 83   // 0 marks an empty list, 1 for a non-empty one.
> 84   typedef uint32_t mask_t;
>
> Is a mask really speeding up anything? Seems the code would be a bit
> simpler removing the mask.
>
>
I'll check.

> 158   // Given a word_size, searches and returns a block of at least that
> size.
> 159   // Block may be larger. Real block size is returned in
> *p_real_word_size.
> 160   MetaWord* get_block(size_t word_size, size_t* p_real_word_size) {
> get_block -> remove_block
>
> 217 typedef BinListImpl<2, 8>  BinList8;
> 218 typedef BinListImpl<2, 16> BinList16;
> 219 typedef BinListImpl<2, 32> BinList32;
> 220 typedef BinListImpl<2, 64> BinList64;
>
> BinListXX typedefs could be removed I think. (see comment above)
>
>
Okay.

>
> ### blocktree.hpp
>
> 79   struct node_t {
> 80
> 81     // Normal tree node stuff...
> 82     node_t* parent;
> 83     node_t* left;
> 84     node_t* right;
> 85
>
>
> I think in the future we could make the node_t without a parent
> pointer, that would make the node size somewhat smaller and we could
> then be able to fit smaller memory areas into the tree.

The way the tree is used node size does not really matter since into the
tree go only larger blocks, smaller blocks are handled by the BinList. It
would be a factor if we wanted to generalize the tree, but otherwise I
think this is unnecessary.

You need the parent pointer for searching predecessor/successor nodes in a
BST, as well as for modifying the parent when removing a node. I do not see
offhand how this could be done without a parent pointer (maybe, when
walking, build up a stack of visited nodes - assuming all operations always
start at root - but I think that coding would be more complicated and
non-standard).

> The code
> should be somewhat simpler as well I think.

Not sure how; I tried to dumb it down as much as possible. In the end
it's pretty standard stuff.  I'm open for ideas of course, but would prefer
to do it as a follow up.

> Maybe file a future
> improvement for this?
>
> 98   // Largest node size, (bit arbitrarily) capped at 4M since we know
> this to
> 99   // be the max possible metaspace allocation size. TODO. Do this
> better.
> 100   const static size_t maximal_word_size = 4 * M;
>
> maximal_word_size should take its value from MAX_CHUNK_BYTE_SIZE in
> some way if it is not too hard.
>
>
Agreed.

> 102   // We need nodes to be at least large enough to hold a node_t
> 103   const static size_t minimal_word_size =
> 104       (sizeof(node_t) + sizeof(MetaWord) - 1) / sizeof(MetaWord);
>
> maybe add a division-ceil comment?
>
>
Okay. I could also use align_up, that may be clearer.

No wait, I think I cannot use align_up in constant expressions :/

I'll add a comment.

> 172   // Given a node n, return its predecessor in the tree
> 173   // (node with the next-smaller size).
> 174   static node_t* successor(node_t* n) {
>
> The comment is wrong on successor (I guess it is copied from predecessor).
>
>
Good catch.

> 199     if (parent != NULL) {
> 200       if (parent->left == child) { // I am a left child
>
> I think  "// child is a left child" is better
>
>
Ok

> 215   // Given a node n and a node forebear, insert n under forebear
> 216   void insert(node_t* forebear, node_t* n) {
>
> The name forebear confused me, it is just a root for
> insertion, and has no existing ancestor relation to "n".
>
>
Well, we talk about parent and child nodes, in that terminology a
removed-by-n parent of a child would be a distant forebear. But if it
confuses you it will confuse others too. I'll change the name to
"insertion_root".

> 230           if (_largest_size_added < n->size) {
> 231             _largest_size_added = n->size;
> 232           }
>
> This is unnecessary as it is already done by add_block(MetaWord* p,
> size_t word_size). I also think that add_block is the better place to
> have the update. The method maybe could be static after this change? I
> also think the opposite decrease update should be done in remove_block
> to keep the symmetry.
>

Good catch, but I think otherwise: the better place would be consistently
in the lower regions, in insert(), since this bookkeeping is only necessary
when adding a new unique block size. Then, for symmetry, it should happen
in remove_block_from_tree.

Also, I rather keep bookkeeping functionality as close as possible to where
the mutation happens. Among other reasons because the code may change at
some point, someone calls remove_node_from_tree() from somewhere else,
bookkeeping has to be still correct.

> There are at least two recursive calls of insert that could be
> tail-called instead (it would be somewhat harder to read, so I am not
> proposing it). However, with an unbalanced tree I guess we
> could get deep call stacks if we are unlucky. I guess it is best to
> leave this as is, just wanted to comment on it.
>
>
Ok

> 278   // Given a node n, remove it from the tree and repair tree.
> 279   void remove_node_from_tree(node_t* n) {
> 280
> 281     assert(n->next == NULL, "do not delete a node which has a
> non-empty list");
>
> Is there any reason for not calling the remove_from_list logic
> here instead of in get_block? We would get rid of this assert, and I
> think it would be better placed logically.
>
>
Hmm I rather like methods which have only one job. We have remove_from_tree
and remove_from_list. I can add a multiplexer function, e.g. remove()
(which would be symmetric with insert()) in between get_block and
remove_from_tree/list. I'll play around a bit.

> 283     // Maintain largest size node to speed up lookup
> 284     if (n->size == _largest_size_added) {
> 285       node_t* pred = predecessor(n);
> 286       if (pred != NULL) {
> 287         _largest_size_added = pred->size;
> 288       } else {
> 289         _largest_size_added = 0;
> 290       }
> 291     }
>
> As I said above, the _largest_size_added logic seems (to me) to fit
> better in remove_block(). It is not really a tree operation but a cached
> result.
>
>
See remarks above.

> I also wonder if the caching is really needed.
>
> If we have a huge tree, /and/ we fail to find a free area
> /repeatedly/, the caching will perform better.
>
> But If the tree is empty, it will be as fast to check anyway, and I
> guess that is the common case).
>
> Every time we remove an element from the tree, the predecessor()
> function needs to be called at the end (just) to update the cached
> value, so each remove (that successfully fetch free memory) is a bit
> more expensive with caching.
>
> Just a thought.
>

Maybe, I'll think about this.

It was thought to be a safety valve. However, I think now that a different
form of valve could be a "good enough" logic. When removing a block from a
tree, it is not highly important to have an ideal match. We will retrieve a
block and chop off the remainder to be reused later. Especially if the
remainder space is small enough to fit into the binlist (O(1) insertion) it
is already a very good fit.

So I may just add a stop at an arbitrary depth. If we are e.g. 30 levels
deep in this tree something has gone deeply wrong and the tree has
degenerated. May just as well take that block then and use it. Even if
there is too much remaining space, and if it will be re-added to the tree,
it will end up in a different tree branch and thus reform the tree.

I still think a better fit would be a rbtree, like the linux kernel has. I
have a prototype somewhere but did not get it waterproof, then ran out of
time. It also seemed too big a gun for just this one use case.

> 379     node_t* n = (node_t*)p;
> 380     n->size = word_size;
> 381     n->next = n->left = n->right = n->parent = NULL;
>
> Placement new would look better in my opinion.
>

Okay, I'll change that.

>
> Thanks,
> Leo
>

Thanks for the review!

Thomas