RFR: 8251158: Implementation of JEP 387: Elastic Metaspace

Reingruber, Richard richard.reingruber at sap.com
Fri Sep 4 07:29:47 UTC 2020


Hi Thomas,

> > I agree that a little bit of documentation would help. Well if it is the
> > new
> >
> > value that is returned, you could assert like this:
> >
> >
> >
> >   void decrement() {
> >
> >     T new_val = Atomic::add(&_c, T(-1));
> >
> > #ifdef ASSERT
> >
> >     assert(new_val != T(-1), "underflow (0-1)");
> >
> > #endif
> >
> >   }
> >
> >
> >
> > I'm not insisting but out of experience I do know that imprecise asserts
> > can be
> >
> > a pain. Note also that Atomic::load_acquire(&_c) does not help to get the
> > most
> >
> > recent value of _c. Acquire only prevents subsequent accesses to be
> > reordered
> >
> > with the load_acquire(). In other words: the state that is observed
> > subsequently
> >
> > is at least as recent as the state observed with the load_acquire() but it
> > can
> >
> > be stale.
> >

> I'll think about this. You may be right. I originally wanted to use the
> return value of A::add but shied away because of the bad documentation. Do
> you think it behaves the same way on all platforms?

> Well I guess it does since the using code I found is shared.

Yes, semantics should be the same on all plattforms.

> Thanks, Richard. I'm currently preparing a new webrev and running tests;
> these changes will have to wait until after that.

Sure thing!

Cheers, Richard.


---------
From: Thomas Stüfe <thomas.stuefe at gmail.com>
Sent: Thursday, September 3, 2020 19:06
To: Reingruber, Richard <richard.reingruber at sap.com>
Cc: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>; Hotspot-Gc-Dev <hotspot-gc-dev at openjdk.java.net>
Subject: Re: RFR: 8251158: Implementation of JEP 387: Elastic Metaspace 
 


On Thu, Sep 3, 2020 at 6:56 PM Reingruber, Richard <richard.reingruber at sap.com> wrote:
Hi Thomas,
 
> Thanks for your review!
 
Welcome! :)
 
> > ====== src/hotspot/share/memory/metaspace/counter.hpp
> >
> >  104   void decrement() {
> >  105 #ifdef ASSERT
> >  106     T old = Atomic::load_acquire(&_c);
> >  107     assert(old >= 1,
> >  108         "underflow (" UINT64_FORMAT "-1)", (uint64_t)old);
> >  109 #endif
> >  110     Atomic::dec(&_c);
> >  111   }
> >
> > I think you could use Atomic::add() which returns the old value and make
> > the assert atomic too:
> >
>   void decrement() {
> >     T old = Atomic::add(&_c, T(-1));
> > #ifdef ASSERT
> >     assert(old >= 1,
> >            "underflow (" UINT64_FORMAT "-1)", (uint64_t)old);
> > #endif
> >   }
> >
> > Same for increment(), increment_by(), decrement_by(), ...
> >
> >
> Thought so too but Atomic::add seems to return the new value, not the old,
> see e.g. atomic_linux_x86.hpp:
 
> struct Atomic::PlatformAdd {
>  ....
>   D add_and_fetch(D volatile* dest, I add_value, atomic_memory_order order)
> const {
>     return fetch_and_add(dest, add_value, order) + add_value;
>   }
> };
 
> I also checked the callers, those few places I found which do anything
> meaningful with the result seem to expect the new value. See
> e.g. G1BuildCandidateArray::claim_chunk().
 
> It is annoying that these APIs are not documented. I thought this is
> because these APIs are obvious to everyone but me but looks like they are
> not.
 
I agree that a little bit of documentation would help. Well if it is the new
value that is returned, you could assert like this:
 
  void decrement() {
    T new_val = Atomic::add(&_c, T(-1));
#ifdef ASSERT
    assert(new_val != T(-1), "underflow (0-1)");
#endif
  }
 
I'm not insisting but out of experience I do know that imprecise asserts can be
a pain. Note also that Atomic::load_acquire(&_c) does not help to get the most
recent value of _c. Acquire only prevents subsequent accesses to be reordered
with the load_acquire(). In other words: the state that is observed subsequently
is at least as recent as the state observed with the load_acquire() but it can
be stale.

I'll think about this. You may be right. I originally wanted to use the return value of A::add but shied away because of the bad documentation. Do you think it behaves the same way on all platforms?

Well I guess it does since the using code I found is shared.
 

 
> > byte_size should also depend on BlockTree::minimal_word_size I think.
> > Something like
> >
> > if (worde_size > FreeBlocks::maximal_word_size)
> >   byte_size = MAX2(byte_size, BlockTree::minimal_word_size * BytesPerWord);
> >
> > FreeBlocks::maximal_word_size needs to be defined for this.
> >
> >
> See above. In addition to what I wrote, BlockTree is an implementation
> detail of FreeBlocks, so it should not matter here.
 
You could assert (statically) somewhere that FreeBlocks::maximal_word_size >= BlockTree::minimal_word_size.
But I'm not insisting.
 

Oh sure, I can do that.
 

Thanks,
Richard.
 

Thanks, Richard. I'm currently preparing a new webrev and running tests; these changes will have to wait until after that.

Cheers, Thomas
 

From: Thomas Stüfe <thomas.stuefe at gmail.com> 
Sent: Donnerstag, 3. September 2020 11:20
To: Reingruber, Richard <richard.reingruber at sap.com>
Cc: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>; Hotspot-Gc-Dev <hotspot-gc-dev at openjdk.java.net>
Subject: Re: RFR: 8251158: Implementation of JEP 387: Elastic Metaspace
 
Hi Richard,
 
Thanks for your review!
 
Please note that code has changed since the first webrev. While that is not a problem - most of your findings are still valid - a large number or renamings took place (see Leo's and Coleen's mails). 
 
I will use the new naming throughout my answers (e.g. maximal_word_size -> MaxWordSize). Hope that is not too confusing; if in doubt pls look at the HEAD of the sandbox repo, or just wait until the new webrev is done.
 
All remarks inline:
 
<snip>
 
======

You added includes of memory/metaspace.hpp to some files without additional
modifications. How did you chose these files?

To jvmciCompilerToVM.hpp you added an include of collectedHeap.hpp. Why did you
do this?
 
metaspace.hpp is needed for either one of MetaspaceType or MetadataType enum. jvmciCompilerToVM.hpp needs a definition or a forward declaration for CollectedHeap.
 

====== src/hotspot/share/memory/metaspace.cpp

 194       // ... and class spacelist.
 195       VirtualSpaceList* vsl = VirtualSpaceList::vslist_nonclass();
 196       assert(vsl != NULL, "Sanity");
 197       vsl->verify(slow);

In 195 VirtualSpaceList::vslist_class() should be called I suppose.
 
Good catch.
 
You could reuse the local vsl as you did with the local cm.
Any reason you assert vsl not NULL in 196 but not in the non-class case?
 
A bit inconsistent, yes. I will remove all these asserts. They are not really useful (if any of those were NULL we would crash in a very obvious manner).
 
 637     // CCS must be aligned to root chunk size, and be at least the size of one
 638     //  root chunk.
 639     adjusted_ccs_size = align_up(adjusted_ccs_size, reserve_alignment());
 640     adjusted_ccs_size = MAX2(adjusted_ccs_size, reserve_alignment());

Line 640 is redundant, isn't it?
 
Well, adjusted_ccs_size could have been 0 to begin with. In the greater context I think it cannot be 0 since CompressedClassSpaceSize cannot be set to zero. But that is non-obvious, so I prefer to leave this code in.
 

@@ -1274,7 +798,7 @@
   assert(loader_data != NULL, "Should never pass around a NULL loader_data. "
         "ClassLoaderData::the_null_class_loader_data() should have been used.");

-  MetadataType mdtype = (type == MetaspaceObj::ClassType) ? ClassType : NonClassType;
+  Metaspace::MetadataType mdtype = (type == MetaspaceObj::ClassType) ? Metaspace::ClassType : Metaspace::NonClassType;

   // Try to allocate metadata.
   MetaWord* result = loader_data->metaspace_non_null()->allocate(word_size, mdtype);

This hunk is not needed.
 
Ok
 
====== src/hotspot/share/memory/metaspace/binlist.hpp

  94      // block sizes this structure can keep are limited by [_min_block_size, _max_block_size)
  95      const static size_t minimal_word_size = smallest_size;
  96      const static size_t maximal_word_size = minimal_word_size + num_lists;

_min_block_size/_max_block_size should be minimal_word_size/maximal_word_size.

The upper limit 'maximal_word_size' should be inclusive IMHO:

  const static size_t maximal_word_size = minimal_word_size + num_lists - 1;

That would better match the meaning of the variable name. Checks in l.148 and
l.162 should be adapted in case you agree.
 
Leo and Coleen wanted this too. The new naming will be consistent and follow hotspot naming rules: (Min|Max)WordSize.
 
<snip>
 
  43    // We store node pointer information in these blocks when storing them. That
  44    //  imposes a minimum size to the managed memory blocks.
  45    //  See MetaspaceArene::get_raw_allocation_word_size().

s/MetaspaceArene::get_raw_allocation_word_size/metaspace::get_raw_word_size_for_requested_word_size/

I agree with the comment, but
metaspace::get_raw_word_size_for_requested_word_size() does not seem to take
this into account.
 
It does, it uses FreeBlocks::MinWordSize. FreeBlocks consists of a tree and a bin list. The tree is only responsible for larger blocks (larger than what would fit into the bin list). Therefore the lower limit is only determined by the bin list minimum word size.
 
Since this may be not obvious, I'll beef up the comment somewhat.
 

  86        // blocks with the same size are put in a list with this node as head.
  89        // word size of node. Note that size cannot be larger than max metaspace size,
 115      // given a node n, add it to the list starting at head
 123      // given a node list starting at head, remove one node from it and return it.

You should begin a sentence consistently with a capital letter (you mostly do it).

 123      // given a node list starting at head, remove one node from it and return it.
 124      // List must contain at least one other node.
 125      static node_t* remove_from_list(node_t* head) {
 126        assert(head->next != NULL, "sanity");
 127        node_t* n = head->next;
 128        if (n != NULL) {
 129          head->next = n->next;
 130        }
 131        return n;
 132      }

Line 129 must be executed unconditionally.
 
Good catch.
 

I'd prefer a more generic implementation that allows head->next to be
NULL. Maybe even head == NULL.
 
I don't think that would be much clearer though. We probably could move remove_from_list() up into remove_block(), but for symmetry reasons this would have to be done for add_to_list() too, and I rather like it this way.
 
<snip>
 215      // Given a node n and a node forebear, insert n under forebear
 216      void insert(node_t* forebear, node_t* n) { 
 217        if (n->size == forebear->size) {
 218          add_to_list(n, forebear); // parent stays NULL in this case.
 219        } else {
 220          if (n->size < forebear->size) {
 221            if (forebear->left == NULL) {
 222              set_left_child(forebear, n);
 223            } else {
 224              insert(forebear->left, n);
 225            }
 226          } else {
 227            assert(n->size > forebear->size, "sanity");
 228            if (forebear->right == NULL) {
 229              set_right_child(forebear, n);
 230              if (_largest_size_added < n->size) {
 231                _largest_size_added = n->size;
 232              }
 233            } else {
 234              insert(forebear->right, n);
 235            }
 236          }
 237        }
 238      }

This assertion in line 227 is redundant (cannot fail).
 
That is true for many asserts I add. I use asserts liberally as guard against bit rot and as documentation. I guess this here could be considered superfluous since the setup code is right above. I will remove that assert.
 

  Leo> There are at least two recursive calls of insert that could be
  Leo> tail-called instead (it would be somewhat harder to read, so I am not
  Leo> proposing it).

I think they _are_ tail-recursions in the current form.
Gcc eliminates them. I checked the release build with gdb:
(disass /s metaspace::FreeBlocks::add_block)

Recursive tail-calls can be easily replaced with loops. To be save I'd suggest
to do that or at least add 'return' after each call with a comment that nothing
must be added between the call and the return too keep it a
tail-recursion. Maybe that's sufficient... on the other hand we don't know if
every C++ compiler can eliminate the calls and stack overflows when debugging
would be also irritating.

 251            return find_closest_fit(n->right, s);
 260            return find_closest_fit(n->left, s);

More tail-recursion. Same as above.
 
I'll rewrite BlockTree to not use recursion.
 
 257          assert(n->size > s, "Sanity");

Assertion is redundant.

 262            // n is the best fit.
 263            return n;

In the following example it is not, is it?

          N1:40
           /
          /
       N2:20
          \
           \
         N3:30

find_closest_fit(N1, 30) will return N2 but N3 is the closest fit. I think you
have to search the left tree for a better fit independently of the size of its
root node.
 
Good catch.
 
 293        if (n->left == NULL && n->right == NULL) {
 294          replace_node_in_parent(n, NULL);
 295    
 296        } else if (n->left == NULL && n->right != NULL) {
 297          replace_node_in_parent(n, n->right);
 298    
 299        } else if (n->left != NULL && n->right == NULL) {
 300          replace_node_in_parent(n, n->left);
 301    
 302        } else {

Can be simplified to:

    if (n->left == NULL) {
      replace_node_in_parent(n, n->right);
    } else if (n->right == NULL) {
      replace_node_in_parent(n, n->left);
    } else {
 
Yes, but I'd rather leave the code as it is; I think it's easier to read that way.
 
 341            // The right child of the successor (if there was one) replaces the successor at its parent's left child.

Please add a line break.

The comments and assertions in remove_node_from_tree() helped to understand the
logic. Thanks!
 
:)
 
====== src/hotspot/share/memory/metaspace/blocktree.cpp

  40    // These asserts prints the tree, then asserts
  41    #define assrt(cond, format, ...) \
  42      if (!(cond)) { \
  43        print_tree(tty); \
  44        assert(cond, format, __VA_ARGS__); \
  45      }
  46    
  47      // This assert prints the tree, then stops (generic message)
  48    #define assrt0(cond) \
  49              if (!(cond)) { \
  50                print_tree(tty); \
  51                assert(cond, "sanity"); \
  52              }

Better wrap into do-while(0) (see definition of vmassert)
 
Ok.
 
<snip>

 110        verify_node(n->left, left_limit, n->size, vd, lvl + 1);

Recursive call that isn't a tail call. Prone to stack overflow. Well I guess you
need a stack to traverse a tree. GrowableArray is a common choice if you want to
eliminate this recursion. As it is only verification code you might as well
leave it and interpret stack overflow as verification failure. 

 118        verify_node(n->right, n->size, right_limit, vd, lvl + 1);

Tail-recursion can be easily eliminated. See comments on blocktree.hpp above.
 
I'll rewrite this to be non-recursive.
 
====== src/hotspot/share/memory/metaspace/chunkManager.cpp

The slow parameter in ChunkManager::verify*() is not used.
 
I'll remove all "slow" params from all verifications and recode this to use -XX:VerifyMetaspaceInterval. I think that is easier to use.
 
====== src/hotspot/share/memory/metaspace/counter.hpp

 104   void decrement() {
 105 #ifdef ASSERT
 106     T old = Atomic::load_acquire(&_c);
 107     assert(old >= 1,
 108         "underflow (" UINT64_FORMAT "-1)", (uint64_t)old);
 109 #endif
 110     Atomic::dec(&_c);
 111   }

I think you could use Atomic::add() which returns the old value and make the assert atomic too: 
  void decrement() {
    T old = Atomic::add(&_c, T(-1));
#ifdef ASSERT
    assert(old >= 1,
           "underflow (" UINT64_FORMAT "-1)", (uint64_t)old);
#endif
  }

Same for increment(), increment_by(), decrement_by(), ...
 
Thought so too but Atomic::add seems to return the new value, not the old, see e.g. atomic_linux_x86.hpp:
 
struct Atomic::PlatformAdd {
 ....
  D add_and_fetch(D volatile* dest, I add_value, atomic_memory_order order) const {
    return fetch_and_add(dest, add_value, order) + add_value;
  }
};
 
I also checked the callers, those few places I found which do anything meaningful with the result seem to expect the new value. See e.g. G1BuildCandidateArray::claim_chunk().
 
It is annoying that these APIs are not documented. I thought this is because these APIs are obvious to everyone but me but looks like they are not.
 
====== src/hotspot/share/memory/metaspace/metaspaceArena.cpp

There's too much vertical white space, I'd think.

metaspace::get_raw_allocation_word_size() is a duplicate of
metaspace::get_raw_word_size_for_requested_word_size()
metaspace::get_raw_allocation_word_size() is only referenced in comments and
should be removed.
 
Oops. Sure, I'll remove that.
 
<snip>
 
byte_size should also depend on BlockTree::minimal_word_size I think. Something like

if (worde_size > FreeBlocks::maximal_word_size)
  byte_size = MAX2(byte_size, BlockTree::minimal_word_size * BytesPerWord);

FreeBlocks::maximal_word_size needs to be defined for this.
 
See above. In addition to what I wrote, BlockTree is an implementation detail of FreeBlocks, so it should not matter here.
 
Thanks Richard. 
 
I will work in your feedback and publish a new webrev shortly.
 
Cheers, Thomas
 


More information about the hotspot-runtime-dev mailing list