RFR: 8312132: Add tracking of multiple address spaces in NMT [v34]

Wed Apr 17 06:50:03 UTC 2024

On Tue, 16 Apr 2024 14:19:25 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>> Hi,
>> 
>> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix.
>> 
>> ## `MemoryFileTracker`
>> 
>> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is:
>> 
>> ```c++
>>     static MemoryFile* make_device(const char* descriptive_name);
>>     static void free_device(MemoryFile* device);
>> 
>>     static void allocate_memory(MemoryFile* device, size_t offset, size_t size,
>>                                 MEMFLAGS flag, const NativeCallStack& stack);
>>     static void free_memory(MemoryFile* device, size_t offset, size_t size);
>> 
>> 
>> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does:
>> 
>> ```c++
>> void ZNMT::reserve(zaddress_unsafe start, size_t size) {
>>   MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap);
>> }
>> void ZNMT::commit(zoffset offset, size_t size) {
>>   MemTracker::allocate_memory_in(ZNMT::_device, static_cast<size_t>(offset), size, mtJavaHeap, CALLER_PC);
>> }
>> void ZNMT::uncommit(zoffset offset, size_t size) {
>>   MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size);
>> }
>> 
>> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) {
>>   // NMT doesn't track mappings at the moment.
>> }
>> void ZNMT::unmap(zaddress_unsafe addr, size_t size) {
>>   // NMT doesn't track mappings at the moment.
>> }
>> 
>> 
>> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. 
>> 
>> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are:
>> 
>> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo...
>
> Johan Sjölen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Style, copyright

Looked closer at the treap. I think we need really good gtests for this.

src/hotspot/share/nmt/nmtTreap.hpp line 53:

> 51:   const K _key;
> 52:   V _value;
> 53:   using Nd = TreapNode<K,V,CMP>;

Can we call this "Node" please :), and nd_pair node_pair?

src/hotspot/share/nmt/nmtTreap.hpp line 66:

> 64:     LEQ // <=
> 65:   };
> 66: 

Both split and merge use recursion, which worries me, since it is not limited in any way I can see. And I think we can rely on the compiler doing tail recursion optimization. Could a degenerated tree cause stack overflows? Can we do this without recursion?

src/hotspot/share/nmt/nmtTreap.hpp line 142:

> 140:     }
> 141:   }
> 142: 

Allocating:

Why do we need to pass in allocators and deallocators as arguments? The only reason I can see is to have per-instance allocators that use tree-local variables. But why does it need access to the tree instance it works for at all? The only thing I see it accesses from the treap is the random seed. That should be using a global seed var, and os::next_random. A global seed var would be perfectly fine in this case, I think. The treap is only accessed under lock protection, right? Even if not, os::random() (the variant using a global seed with CAS) would be a valid alternative.

(Arguably, initializing the rng seed is also not the job of an allocator, but of the function calling the allocator)

So, a simplification would be to have the allocator and deallocator as class template parameters for the treapnode or for the trea. That would also get of the many duplicate definitions of free functions that all call os::free, for instance.

A further simplification would be to have just one Allocator template parameter giving me an allocation and a deallocation function, since they come and work in pairs.

A further simplification would be to have these actually be part of the tree, not the tree node: I prefer the individual nodes in a tree or list or similar to be dumb data holders, and for the containing tree to hold the tree related logic of merging, inserting etc. I think that is the more natural distribution of responsibilities. It also simplifies things that are done on a per-tree basis, like accounting. It avoids having to marshall a lot of calls (e.g. Tree->upsert just delegates to TreeNode->upsert)

src/hotspot/share/nmt/nmtTreap.hpp line 187:

> 185:       free(head);
> 186:     }
> 187:     return nullptr;

? Why does this return anything

src/hotspot/share/nmt/nmtTreap.hpp line 190:

> 188:   }
> 189: };
> 190: 

TreapCHeap: I dislike having all tree logic in a tree class that is already bound to a specific allocation pattern. I may want to reuse the tree with different allocators, e.g. to avoid malloc. Another reason to make the allocator part of the template parameter of a tree.

src/hotspot/share/nmt/nmtTreap.hpp line 196:

> 194:   friend class VMATreeTest;
> 195:   using CTreap = TreapNode<K, V, CMP>;
> 196:   CTreap* root;

What is a CTreap? To my mind, the treap is the tree, and a node in that tree is a treapnode. Can we not just call CTreap a node? And maybe use the same name we used above in TreapNode?

What meaning has the C?

src/hotspot/share/nmt/nmtTreap.hpp line 197:

> 195:   using CTreap = TreapNode<K, V, CMP>;
> 196:   CTreap* root;
> 197:   uint64_t prng_seed;

Leading underscore for member vars?

src/hotspot/share/nmt/nmtTreap.hpp line 213:

> 211:     static const uint64_t PrngModMask = (static_cast<uint64_t>(1) << PrngModPower) - 1;
> 212:     prng_seed = (PrngMult * prng_seed + PrngAdd) & PrngModMask;
> 213:     return prng_seed;

Do we need our own random generator here? Why not use os::next_random?

-------------

PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2005033795
PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1568283280
PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1568301951
PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1568268168
PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1568304592
PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1568295979
PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1568230895
PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1568228678
PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1568229090