RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v21]

Thomas Stuefe stuefe at openjdk.org
Thu Jun 20 05:56:17 UTC 2024


On Mon, 17 Jun 2024 09:15:46 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>> Hi,
>> 
>> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names.
>> 
>> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage.
>> 
>> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help.
>> 
>> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR.
>> 
>> The results are as follows on linux-x64-slowdebug:
>> 
>> 
>> Generate stacks... Done
>> Time taken with GrowableArray: 8341.240945
>> Time taken with CHeap: 12189.031318
>> Time taken with Arena: 8800.703092
>> Time taken with GrowableArray again: 8295.508829
>> 
>> 
>> And on linux-x64:
>> 
>> 
>> Time taken with GrowableArray: 8378.018135
>> Time taken with CHeap: 12437.347868
>> Time taken with Arena: 8758.064717
>> Time taken with GrowableArray again: 8391.076291
>> 
>> 
>> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case.
>
> Johan Sjölen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix tests

@jdksjolen sorry, had this review sitting in pending state for days appearantly and forgot to send it off.

One thing I don't understand. I thought the point of this new class is to move the **stacks** over to the new allocator-with-freelist, and make StackIndex a simple typedef for the allocator index? Then, get rid of the GrowableArray?

You put the hashtable entries in there, which is okay, but not using the full potential of this new class.

src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 34:

> 32: // memory savings if a pointer-heavy self-referential structure is used.
> 33: // It is "indexed" as a reference is base + index * sizeof(E).
> 34: // It never returns any memory to the system.

Proposal:


A flat array of elements E, backed by C-heap, growing on-demand. It allows for 
returning arbitrary elements and keeps them in a freelist. Elements can be uniquely
identified via array index.

src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 36:

> 34: // It never returns any memory to the system.
> 35: template<typename E, MEMFLAGS flag>
> 36: class IndexedFreeListAllocator {

This is a mouth-full and does not describe well what it does. How about "HomogenousObjectHeap" or "HomogenousObjectArray" or "HomogenousObjectSpace"

src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 38:

> 36: class IndexedFreeListAllocator {
> 37: public:
> 38:   using I = int32_t;

Proposal, possibly for a follow-up RFE:

- its a pity to bury this in nmt, we may have uses for it elsewhere
- If we only have a small number of elements, I don't want to burn 4 bytes on an index.
- I dislike the fact that we burn half the value range of index to "invalid"
- AFAICS we don't have a index overflow check in allocate (what happens with >2 billion elements?)

So:


- make this a general-purpose container in utilities
- I would make the index type a template parameter, (X) with the rule that it should be an unsigned integral (and STATIC_ASSERT that)
- I would then:


constexpr X nil = std::numeric_limits<X>::max();
constexpr X max = nil - 1;


Then, down in allocate, make sure we don't go over `max`.

Now, If I know that I only need e.g. <255 elements, I can use a single byte as index, that could give me good packing depending on where the index is stored. And we get index overflow checking inbuilt, too.

src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 82:

> 80:   }
> 81: 
> 82:   void free(I i) {

`free_element` ? Or `return_element`? I dislike `free` as a name because of the conflict with free. For symmetry, then we need `allocate_element`.

src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 97:

> 95: 
> 96:   const E& at(I i) const {
> 97:     assert(i != nil, "null pointer dereference");

Not needed?

src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 44:

> 42: // We achieve this by using a closed hashtable for finding previously existing NCS:s and referring to them by an index that's smaller than a pointer.
> 43: template<template<typename, MEMFLAGS> class ALLOCATOR>
> 44: class NativeCallStackStorageWithAllocator : public CHeapObj<mtNMT> {

I don't think we need a templatized version of NativeCallStackStorage. Please let's keep it simple and stupid. We will only ever have one form of NativeCallStackStorage.

Should we ever need multiple forms of NativeCallStackStorage simultaneously, we can still templatize.

src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 82:

> 80:     LinkPtr next;
> 81:     StackIndex stack;
> 82:     Link(LinkPtr next, StackIndex v)

Pre-existing. We should never modify the hash table, so the stackindex can be const.

-------------

PR Review: https://git.openjdk.org/jdk/pull/18979#pullrequestreview-2124550622
PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1643868189
PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1643866034
PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1643872807
PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1643890313
PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1643900631
PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1646970936
PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1646977747


More information about the hotspot-dev mailing list