RFR: 8345314: Add a red–black tree as a utility data structure [v12]

Thomas Stuefe stuefe at openjdk.org
Thu Jan 16 08:24:46 UTC 2025


On Mon, 13 Jan 2025 13:00:16 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>>> Thinking about this some more, the fact that tree nodes are allocated internally and are separate entities from the key/value holders precludes the using this tree for memory management (address stable or not). It works, but the separate allocation defeats the purpose. It would be good if, on `upsert`, I could hand in optionally the future node memory itself, and the node should then be placed into/at the start of this memory. And it must be able to deal with the handed-in node memory be variable-sized, of course.
>> 
>> Or, much simpler, let me construct the node itself and hand it into the tree.
>
> Hi @tstuefe ,
> 
>>Consider this scenario: I want to have two short-lived trees, nodes live longer than their trees they temporarily occupy, and I want the nodes to reside in physically different preallocated pools. I want to use the same allocator and tree classes, though, so template parameters must be the same.
> 
>>There are two ways to do this. The most obvious would be to keep a reference to an allocator instance that the caller hands in via constructor. So, make _allocator a pointer, and if omitted by the caller use your own C-heap allocator. Otherwise, use the caller's allocator.
> 
> Or you have a reference as the template type and you alter the constructor of the `RBTree` to take such a type to construct its own allocator with.
> 
> It seems like you want an intrusive RBTree. That's really no problem, it just means that we have to extend the API slightly and make it so that you can avoid using an allocator.
> 
> I do want an allocator instance in the tree, because then you can make very nice things such as a `IndexWithFreeList` allocator that doesn't need protection from concurrent writes. You could also, for example, avoid the cost of freeing separate elements and eliminate the whole tree at once when the lifetime of the tree has ended (since the lifetime of the allocator is identical). In short: When your tree isn't intrusive, it's very nice to have a custom allocator.
> 
> For your use case, you basically need a very similar addition to the API. It'd look something like this in pseudo-code:
> 
> ```c++
> // I hold the data, but the node holds the key
> struct IntrusiveHolder {
>   int data;
>   RBNode node;
>   static IntrusiveHolder* cast_to_self(RBNode* node) ; // offset_of() magic
> };
> 
> struct Compare {
>   int cmp(int a, int b);
> }
> 
> template<typename K, typename V, typename Cmp, typename Allocator>
> struct RBTree{
> public:
>   enum Dir { left, right };
>   // Describes the parent node and whether the next node is to be inserted on the left or the right.
>   struct InsertionCursor {
>    RBNode* parent; Dir dir;
>    RBNode* position(); // Get the node (potentially nullptr)
>   };
> 
>   // Walks the tree in the obvious order, using Comparator::cmp overload to find the place to insert.
>   // However does not allocate.
>   InsertionCursor find_insertion_point(K k);
>   // Inserts the node at the position and rebalances the tree.
>   void insert_at_cursor(InsertionCursor, RBNode* node);
> 
>   // For removals we need to unlink the node from the tree, but keep it alive and then just return the pointer.
>   RBNode*...

Hi @jdksjolen, @caspernorrbin 

sorry for the delay, work is piled up.

> 
> It seems like you want an intrusive RBTree. That's really no problem, it just means that we have to extend the API slightly and make it so that you can avoid using an allocator.

intrusive - I did not know this term, even though I wrote containers like this all my life, go figure :-). 

Yes, that's precisely what I want.

- I want to create/store nodes myself 
- I want to avoid copying node content; in fact, I don't want to even bother with assignment operators.
- I want to be able to add child nodes to a node to handle duplicates if my tree allows duplicates (which is of no concern to the tree, but since these child nodes should live in the same pool, again, my wish for wanting to create nodes myself)

All of this points to a tree where I manage the node myself. So, have a function to find a reference to an existing node by key and have a function to hand in a node for storage. Pretty much like you line it out in your example:


MyTree::add_my_bulky_and_possible_varsized_node(Node* n) {
	- find node n2 with key n->key
		- found? Add n as child node under n2
		- not found? add n as new node
}


I have two immediate examples where I want to use this RB tree: the reworked c2 memory statistic (in fact, I withhold the patch to wait for this RB tree to drop) and - somewhat later - the metaspace binary tree dictionary replacement. Both cases would be perfectly served with an intrusive key that allows me to handle duplicates.


> 
> I do want an allocator instance in the tree, because then you can make very nice things such as a `IndexWithFreeList` allocator that doesn't need protection from concurrent writes.

You lost me here

> You could also, for example, avoid the cost of freeing separate elements and eliminate the whole tree at once when the lifetime of the tree has ended (since the lifetime of the allocator is identical).

This is not a strong argument though, since when I create the allocator outside the tree, I can let it die in the same fashion when the tree dies. This is what I plan to do. But as you wrote, we should allow both intrusive and non-intrusive.

> Casper already is working on supporting an intrusive RBTree with an API similar to what I sketched out above, but I'd like it if we can integrate the current version of the tree as it is. With this change we can already remove the NMT Treap and replace it, so that's a nice use case. Plus, there'll always be new features, we can't wait with integration until all possible features are implemented :).

Hmm, okay. I'll give the patch a closer look later today and then approve it unless I find something.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22360#issuecomment-2594803275


More information about the hotspot-dev mailing list