RFR: 8337217: Port VirtualMemoryTracker to use VMATree

Fri Nov 8 10:51:37 UTC 2024

On Thu, 1 Aug 2024 15:44:32 GMT, Afshin Zafari <azafari at openjdk.org> wrote:

> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`.
>  -  A wrapper/helper `RegionTree` is made around VMATree to make some calls easier.
>  -  Both old and new versions exist in the code and can be selected via `MemTracker::set_version()`
>  - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs.
>  - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~  https://bugs.openjdk.org/browse/JDK-8335167.
>  - Adding a runtime flag for selecting the old or new version can be added later.
>  - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time.

Changes requested by gziemski (Reviewer).

More review feedback...

Changes requested by gziemski (Reviewer).

Are we keeping both the linked list based and VMATree based implementations just for the review process, and we will drop the linked list one, once we are done with the review, or are they both going to be checked in?

Is there a concrete advantage here for using lambda expression when iterating committed regions? I'm asking because personally I find 

    while ((committed_rgn = itr.next()) != nullptr) {
      print_committed_rgn(committed_rgn);
    }

simpler and more compact, hence easier to understand, than

    CommittedMemoryRegion cmr;
    VirtualMemoryTrackerWithTree::tree()->visit_committed_regions(reserved_rgn, &cmr, [&](CommittedMemoryRegion* crgn) {
      print_committed_rgn(crgn);
      return true;
    });

Same here:

    if (MemTracker::is_using_tree()) {
      CommittedMemoryRegion cmr;
      bool reserved_and_committed = false;
      VirtualMemoryTrackerWithTree::tree()->visit_committed_regions(reserved_rgn,
                                                                    &cmr,
                                                                    [&](CommittedMemoryRegion* committed_rgn) {
        if (committed_rgn->size() == reserved_rgn->size() && committed_rgn->call_stack()->equals(*stack)) {
          // One region spanning the entire reserved region, with the same stack trace.
          // Don't print this regions because the "reserved and committed" line above
          // already indicates that the region is committed.
          reserved_and_committed = true;
          return false;
        }
        return true;
      });

      if (reserved_and_committed)
        return;
    }
  }

We are setting `reserved_and_committed` inside lambda, then we need to check it outside. It's messy and harder to follow which `return` belongs to which context.

> > Is there a concrete advantage here for using lambda expression when iterating committed regions? I'm asking because personally I find
> > ```c++
> >     while ((committed_rgn = itr.next()) != nullptr) {
> >       print_committed_rgn(committed_rgn);
> >     }
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > simpler and more compact, hence easier to understand, than
> > ```c++
> >     CommittedMemoryRegion cmr;
> >     VirtualMemoryTrackerWithTree::tree()->visit_committed_regions(reserved_rgn, &cmr, [&](CommittedMemoryRegion* crgn) {
> >       print_committed_rgn(crgn);
> >       return true;
> >     });
> > ```
> 
> Hi Gerard, I'm the one who prefers passing in a lambda and introduced that style, sorry :-).
> 
> First off, I think the lambda code should be simplified, it should be:
> 
> ```c++
> VMWT::tree().visit_committed_regions_of(*reserved_rgn, [](const CommittedMemoryRegion& crgn) {
>   print_committed_region(crgn));
>   return true;
> });
> ```
> 
> I don't think that's too bad, right? The `return true` is a bit unfortunate. It's for being able to stop early, which I agree that regular iterators do better by just performing a `break;`.
> 
> I'll go ahead and say one nice thing too: If the body of the lambda is a bit more complicated, then we can look at the capture list (the `[]` above) to see what the lambda can alter in the outside scope. With a while-loop, you really have to read the whole thing.
> 
> **The reason** that I write these kinds of iterators is that they're much simpler to implement and maintain and, _to me_, STL-style iterators aren't easier to read, it's about the same level of complexity.
> 
> I'll admit that your style of iterators (which are _not_ STL) is about the same complexity, though I find it unfortunate that we have to write an entire class for each iterator.
> 
> With the simplifications I made, is this style of iterator acceptable?

hi Johan,

Yes, this does look better. I looked at https://www.geeksforgeeks.org/when-to-use-lambda-expressions-instead-of-functions-in-cpp to see when one should consider using lambda and I like one particular scenario - improving readability by implementing the code locally, so one can see exactly what is going on without having to look inside a function. I think this is our use case scenario here.

If only we didn't need all those `return` statements, I'd have clearly preferred lambda here in fact.

Is the plan to check-in the fix with both paths? Or are we going to remove the linked-list based one after the review?

Did you consider using the single lambda in `report_virtual_memory_region()` as I suggested?

outputStream* out = output();
const char* scale = current_scale();
auto print_rgn = [&](const VirtualMemoryRegion* rgn, const NativeCallStack* stack, 
                     MEMFLAGS mem_type = MEMFLAGS::mtNone,
                     const char* region_type = "committed",
                     bool committed = true) {
  // Don't report if size is too small
  if (amount_in_current_scale(rgn->size()) == 0) return;
  out->cr();
  INDENT_BY(committed?8:0,
            print_virtual_memory_region(region_type, rgn->base(), rgn->size());
            if (stack->is_empty()) {
    out->cr(); } else {
      if (!committed) {
        out->print(" for %s", NMTUtil::flag_to_name(mem_type));
      } out->print_cr(" from ");
      INDENT_BY(4, _stackprinter.print_stack(stack);)
    }
  )
};

and then:

bool all_committed = reserved_rgn->size() == reserved_rgn->committed_size();
const char* region_type = (all_committed ? "reserved and committed" : "reserved");
print_rgn(reserved_rgn, reserved_rgn->call_stack(), reserved_rgn->flag(), region_type, false);

and

if (MemTracker::is_using_sorted_link_list()) {
  CommittedRegionIterator itr = reserved_rgn->iterate_committed_regions();
  CommittedMemoryRegion* crgn;
  while ((crgn = itr.next()) != nullptr) {
    print_rgn(crgn, crgn->call_stack());
  }
}
if (MemTracker::is_using_tree()) {
  VirtualMemoryTrackerWithTree::Instance::tree()->visit_committed_regions(*reserved_rgn, [&](CommittedMemoryRegion& crgn) {
    print_rgn(&crgn, crgn.call_stack()); 
    return true;
  });
}

I honestly do not think we should be checking in 2 implementations, unless they are nicely hidden away. I do not like the way we manage them right now with 2 explicit `if` checks, each and every time we need to do something.

If you could push the 2 implementations into a factory class, so that instead of:

   if (is_using_sorted_link_list())
       VirtualMemoryTracker::snapshot_thread_stacks();
     if (is_using_tree())
       VirtualMemoryTrackerWithTree::Instance::snapshot_thread_stacks();

we have just:

`  VirtualMemoryTracker::snapshot_thread_stacks();
`
and 

  VirtualMemoryTracker::VirtualMemoryTracker(bool useLinkedList) {
    if (useLinkedList) {
      instance = new VirtualMemoryTrackerWithLinkedList()
    } else {
      instance = new VirtualMemoryTrackerWithLinkedList()
    }
  }

  VirtualMemoryTracker::snapshot_thread_stacks() {
    instance.snapshot_thread_stacks();
  }

I still haven't run the benchmarks to verify the speed increase promise. Ideally, I would like to do this with my mechanism, but will only do it, if I can manage it without getting sucked into working on the mechanism itself.

I will definitively want to run the provided microbenchmarks though.

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 279:

> 277:                                 "Cannot commit verification bitmap memory");
> 278:     }
> 279:     MemTracker::record_virtual_memory_type(verify_bitmap.base(), verify_bitmap..size(), mtGC);

Does this even compile?

src/hotspot/share/nmt/memReporter.cpp line 440:

> 438:   if (all_committed) {
> 439:     if (MemTracker::is_using_sorted_link_list()) {
> 440:     CommittedRegionIterator itr = reserved_rgn->iterate_committed_regions();

Indentation.

src/hotspot/share/nmt/memReporter.cpp line 467:

> 465: 
> 466:       if (reserved_and_committed)
> 467:         return;

This looks better, but now that I got more comfortable with anonymous local functions using lambda, I think we can do better. We can write a single `print_rgn` lambda function that we can use in both "reserved" and "committed" cases:

  outputStream* out = output();
  const char* scale = current_scale();
  auto print_rgn = [&](const VirtualMemoryRegion* rgn, const NativeCallStack* stack,
                       MEMFLAGS mem_type = MEMFLAGS::mtNone,
                       const char* region_type = "committed",
                       bool committed = true) {
    // Don't report if size is too small
    if (amount_in_current_scale(rgn->size()) == 0) return;
    out->cr();
    INDENT_BY(committed?8:0,
      print_virtual_memory_region(region_type, rgn->base(), rgn->size());
      if (stack->is_empty()) {
        out->cr();
      } else {
        if (!committed) {
          out->print(" for %s", NMTUtil::flag_to_name(mem_type));
        }
        out->print_cr(" from ");
        INDENT_BY(4, _stackprinter.print_stack(stack);)
      }
    )
  };

src/hotspot/share/nmt/memReporter.cpp line 478:

> 476:     stack = committed_rgn->call_stack();
> 477:     out->cr();
> 478:     INDENT_BY(8,

This entire section looks messy and in need of some cleanup. We have lines that are only relevant to one implementation not the other, used in both. We share `committed_rgn` name between common path and old impl, then use `crgn` in the new impl.

May I suggest something like:

  auto print_committed_rgn = [&](CommittedMemoryRegion* crgn) {
    // Don't report if size is too small
    if (amount_in_current_scale(crgn->size()) == 0) return;
    stack = crgn->call_stack();
    out->cr();
    INDENT_BY(8,
      print_virtual_memory_region("committed", crgn->base(), crgn->size());
      if (stack->is_empty()) {
        out->cr();
      } else {
        out->print_cr(" from");
        INDENT_BY(4, stack->print_on(out);)
      }
    )
  };
  if (MemTracker::is_using_sorted_link_list()) {
    CommittedRegionIterator itr = reserved_rgn->iterate_committed_regions();
    CommittedMemoryRegion* crgn;
    while ((crgn = itr.next()) != nullptr) {
      print_committed_rgn(crgn);
    }
  }
  if (MemTracker::is_using_tree()) {
    CommittedMemoryRegion cmr;
    VirtualMemoryTrackerWithTree::tree()->visit_committed_regions(reserved_rgn, &cmr, [&](CommittedMemoryRegion* crgn) {
      print_committed_rgn(crgn);
      return true;
    });
  }

src/hotspot/share/nmt/memReporter.cpp line 482:

> 480:       } else {
> 481:         out->print_cr(" from");
> 482:         INDENT_BY(4, stack->print_on(out);)

Why aren't we using `_stackprinter` here to print, like in all the other places? 

`    INDENT_BY(4, _stackprinter.print_stack(stack);)
`

src/hotspot/share/nmt/memTracker.hpp line 67:

> 65:     if (is_using_tree())
> 66:       VirtualMemoryTrackerWithTree::Instance::snapshot_thread_stacks();
> 67:   }

I'd prefer to see just:

  static void snapshot_thread_stacks() {
       VirtualMemoryTracker::snapshot_thread_stacks();
   }

and:

  static void VirtualMemoryTracker::snapshot_thread_stacks() {
     if (is_using_sorted_link_list())
       VirtualMemoryTrackerWithLinkedList::Instance::snapshot_thread_stacks();
     if (is_using_tree())
       VirtualMemoryTrackerWithTree::Instance::snapshot_thread_stacks();
   }

Basically hide the two implementations and then we don't need to expose them. All the APIs such as `is_using_tree()` would not need to be public, but just implementation detail.

If we wanted to go one step further, we could create an instance of either one at the init time, then keep using that instance, instead of checking each time we want to do something.

src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 94:

> 92:     if (si._stack_index < 0 || si._stack_index >= _stacks.length()) {
> 93:       return _fake_stack;
> 94:     }

Is that a leftover from debugging?

Shouldn't this be an `assert` in final code?

src/hotspot/share/nmt/nmtTreap.hpp line 219:

> 217:       }
> 218:       last_seen = node;
> 219:       return true;

Why are we returning a `bool` value in `void` function?

src/hotspot/share/nmt/nmtTreap.hpp line 320:

> 318:       }
> 319:       head = to_visit.pop();
> 320:       if(!f(head))

Needs space `if (!f(head))`

src/hotspot/share/nmt/nmtTreap.hpp line 348:

> 346:       const int cmp_to = COMPARATOR::cmp(head->key(), to);
> 347:       if (cmp_from >= 0 && cmp_to < 0) {
> 348:         if(!f(head))

Needs space `if (!f(head))`

src/hotspot/share/nmt/nmtTreap.hpp line 356:

> 354:         head = nullptr;
> 355:       }
> 356: 

Remove.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2228842744
PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2236512021
Changes requested by gziemski (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2246166901
PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2331868978
PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2276641131
PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2276654335
PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2276656880
PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2278278886
PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2307806504
PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2316185120
PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2377524395
PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2377532616
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1777443492
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1710280816
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1717432884
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1710318264
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1715887989
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1722171034
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1722186507
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1777449328
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1722187617
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1722188013
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1722188713