RFR: 8345314: Add a red–black tree as a utility data structure [v12]
Johan Sjölen
jsjolen at openjdk.org
Mon Jan 13 13:02:42 UTC 2025
On Mon, 13 Jan 2025 07:33:10 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> Thinking about this some more, the fact that tree nodes are allocated internally and are separate entities from the key/value holders precludes the using this tree for memory management (address stable or not). It works, but the separate allocation defeats the purpose. It would be good if, on `upsert`, I could hand in optionally the future node memory itself, and the node should then be placed into/at the start of this memory. And it must be able to deal with the handed-in node memory be variable-sized, of course.
>
>> Thinking about this some more, the fact that tree nodes are allocated internally and are separate entities from the key/value holders precludes the using this tree for memory management (address stable or not). It works, but the separate allocation defeats the purpose. It would be good if, on `upsert`, I could hand in optionally the future node memory itself, and the node should then be placed into/at the start of this memory. And it must be able to deal with the handed-in node memory be variable-sized, of course.
>
> Or, much simpler, let me construct the node itself and hand it into the tree.
Hi @tstuefe ,
>Consider this scenario: I want to have two short-lived trees, nodes live longer than their trees they temporarily occupy, and I want the nodes to reside in physically different preallocated pools. I want to use the same allocator and tree classes, though, so template parameters must be the same.
>There are two ways to do this. The most obvious would be to keep a reference to an allocator instance that the caller hands in via constructor. So, make _allocator a pointer, and if omitted by the caller use your own C-heap allocator. Otherwise, use the caller's allocator.
Or you have a reference as the template type and you alter the constructor of the `RBTree` to take such a type to construct its own allocator with.
It seems like you want an intrusive RBTree. That's really no problem, it just means that we have to extend the API slightly and make it so that you can avoid using an allocator.
I do want an allocator instance in the tree, because then you can make very nice things such as a `IndexWithFreeList` allocator that doesn't need protection from concurrent writes. You could also, for example, avoid the cost of freeing separate elements and eliminate the whole tree at once when the lifetime of the tree has ended (since the lifetime of the allocator is identical). In short: When your tree isn't intrusive, it's very nice to have a custom allocator.
For your use case, you basically need a very similar addition to the API. It'd look something like this in pseudo-code:
```c++
// I hold the data, but the node holds the key
struct IntrusiveHolder {
int data;
RBNode node;
static IntrusiveHolder* cast_to_self(RBNode* node) ; // offset_of() magic
};
struct Compare {
int cmp(int a, int b);
}
template<typename K, typename V, typename Cmp, typename Allocator>
struct RBTree{
public:
enum Dir { left, right };
// Describes the parent node and whether the next node is to be inserted on the left or the right.
struct InsertionCursor {
RBNode* parent; Dir dir;
RBNode* position(); // Get the node (potentially nullptr)
};
// Walks the tree in the obvious order, using Comparator::cmp overload to find the place to insert.
// However does not allocate.
InsertionCursor find_insertion_point(K k);
// Inserts the node at the position and rebalances the tree.
void insert_at_cursor(InsertionCursor, RBNode* node);
// For removals we need to unlink the node from the tree, but keep it alive and then just return the pointer.
RBNode* remove(K k);
};
// An intrusive RBTree doesn't store any V, and it should not use a custom allocator.
struct Empty{};
struct NoopAlloc {
void* allocate(size_t sz) {
assert(false, "no");
return nullptr;
}
void free(void* ptr) {
assert(false, "no");
return;
}
};
template<typename K, typename Cmp>
using IntrusiveRBTree = RBTree<K, Empty, Cmp, NoopAlloc>;
void insert_and_then_remove(IntrusiveRBTree<int, Compare>& tree, int key, int data) {
InsertionCursor cursor = tree.find_insertion_point(key);
if (cursor.position() != nullptr) {
IntrusiveHolder::cast_to_self(cursor.position())->data = data;
} else {
// Our custom allocation scheme is just malloc here
IntrusiveHolder* place = os::malloc(sizeof(IntrusiveHolder), mtTest);
new (place) IntrusiveHolder(data, RBNode());
tree.insert_at_cursor(cursor, key, place->node);
}
RBNode* node = tree.remove(key);
if (node != nullptr) {
os::free(IntrusiveHolder::cast_to_self(node)); // Find the outer structure which malloc knows how to free.
}
}
We could also change the comparator so that it takes `(RBNode*, RBNode*)`, then the key can also be stored in the `IntrusiveHolder` and we can make `K = Empty`. The details on that are slightly fuzzy.
Casper already is working on supporting an intrusive RBTree with an API similar to what I sketched out above, but I'd like it if we can integrate the current version of the tree as it is. With this change we can already remove the NMT Treap and replace it, so that's a nice use case. Plus, there'll always be new features, we can't wait with integration until all possible features are implemented :).
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22360#issuecomment-2587043371
More information about the hotspot-dev
mailing list