RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v15]

Johan Sjölen jsjolen at openjdk.org
Wed Jun 12 14:45:15 UTC 2024


On Wed, 12 Jun 2024 14:34:39 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>> Hi,
>> 
>> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names.
>> 
>> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage.
>> 
>> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help.
>> 
>> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR.
>> 
>> The results are as follows on linux-x64-slowdebug:
>> 
>> 
>> Generate stacks... Done
>> Time taken with GrowableArray: 8341.240945
>> Time taken with CHeap: 12189.031318
>> Time taken with Arena: 8800.703092
>> Time taken with GrowableArray again: 8295.508829
>> 
>> 
>> And on linux-x64:
>> 
>> 
>> Time taken with GrowableArray: 8378.018135
>> Time taken with CHeap: 12437.347868
>> Time taken with Arena: 8758.064717
>> Time taken with GrowableArray again: 8391.076291
>> 
>> 
>> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case.
>
> Johan Sjölen has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - Remove perf test
>  - Add is_nil() to the CHeap and Arena allocators
>  - Hopefully the last fix

I had a go at making the `_idx` field const and hiding the constructor from external users. It turns out this is super painful, as we suddenly need to accommodate `GrowableArray`s long list of expected behavior from `E`. This means implementing constructors/copy constructors/copy assignment for I and BackingElement, much more than just a simple data carrier. Annoying, but doable. Personally, I'm fine leaving it as is, but I'd like the reviewers' opinions on this.

With regards to double frees: I'm not entirely sure on how to implement this yet, as we don't have a tagged union so we can't discern between freeing an actual element and a pointer in the freelist. The obvious, but computationally expensive, solution is to traverse the freelist and see if we find the index which is being freed. The other solution is to, of course, tag the union at debug time, but that adds size to the elements. I think it might be better to  do it with a tag, but that circles back to the issue that Thomas had with `I::_owner`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2163200919


More information about the hotspot-dev mailing list