RFR: 8345314: Add a red–black tree as a utility data structure [v12]

Tue Jan 28 14:15:05 UTC 2025

On Thu, 16 Jan 2025 12:28:52 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>>> Hi @jdksjolen, @caspernorrbin
>>> 
>>> sorry for the delay, work is piled up.
>>> 
>>> > It seems like you want an intrusive RBTree. That's really no problem, it just means that we have to extend the API slightly and make it so that you can avoid using an allocator.
>>> 
>>> intrusive - I did not know this term, even though I wrote containers like this all my life, go figure :-).
>>> 
>>> Yes, that's precisely what I want.
>>> 
>>>     * I want to create/store nodes myself
>>> 
>>>     * I want to avoid copying node content; in fact, I don't want to even bother with assignment operators.
>>> 
>>>     * I want to be able to add child nodes to a node to handle duplicates if my tree allows duplicates (which is of no concern to the tree, but since these child nodes should live in the same pool, again, my wish for wanting to create nodes myself)
>>> 
>>> 
>>> All of this points to a tree where I manage the node myself. So, have a function to find a reference to an existing node by key and have a function to hand in a node for storage. Pretty much like you line it out in your example:
>>> 
>>> ```
>>> MyTree::add_my_bulky_and_possible_varsized_node(Node* n) {
>>> 	- find node n2 with key n->key
>>> 		- found? Add n as child node under n2
>>> 		- not found? add n as new node
>>> }
>>> ```
>>> 
>>> I have two immediate examples where I want to use this RB tree: the reworked c2 memory statistic (in fact, I withhold the patch to wait for this RB tree to drop) and - somewhat later - the metaspace binary tree dictionary replacement. Both cases would be perfectly served with an intrusive key that allows me to handle duplicates.
>> 
>> This requires the tree to have nodes `<` strictly on the left, and `>=` strictly on the right (as opposed to `>` strictly on the right), or the opposite. I have no clue what having duplicates does to a RBTree when rebalancing. If it's that important to have duplicates, then why have them live inside the tree anyway? Just link them together with an intrusive linked list.
>> 
>> ```c++
>> struct Foo {
>>   RBNode rb; // with some K as key
>>   Foo* next;
>>   struct { /* ... */ } data;
>> };
>> 
>> void a_m_b_a_p_v_n(Foo* n) {
>>   Cursor c = tree.find_insertion_point(n);
>>   if (c.exists()) {
>>     Foo* f = Foo::cast_to_self(c.node());
>>     n->next = f->next;
>>     f->next = n->next;
>>   }
>>   tree.insert(c, n);
>> };
>> 
>> 
>> Just do that. Let's us entirely skip the headache of the rest of it, at the cost of 8 bytes.
>>  
>>> > I do want an allocator instance in the tree, beca...
>
>> > Hi @jdksjolen, 
> 
>> > 
>> > I have two immediate examples where I want to use this RB tree: the reworked c2 memory statistic (in fact, I withhold the patch to wait for this RB tree to drop) and - somewhat later - the metaspace binary tree dictionary replacement. Both cases would be perfectly served with an intrusive key that allows me to handle duplicates.
>> 
>> This requires the tree to have nodes `<` strictly on the left, and `>=` strictly on the right (as opposed to `>` strictly on the right), or the opposite. I have no clue what having duplicates does to a RBTree when rebalancing. If it's that important to have duplicates, then why have them live inside the tree anyway? Just link them together with an intrusive linked list.
>> 
> 
> You misunderstood me - a linked list is exactly what I meant. That is why I talked about a "child node". I did this in metaspace: https://github.com/openjdk/jdk/blob/f64f22b360f68df68ebb875bd0ef08ba61702952/src/hotspot/share/memory/metaspace/blockTree.hpp#L89-L96
> 
> I can do that myself, as I wrote, this does not concern the tree. So for the tree we can keep things without duplicates. But for simplicity the easiest solution is for child nodes to be of the Node class itself, because then you can just swap them around and make them the leader node for this key value later. That would require a child Node to be allocated in the same pool as the "real" tree nodes, and have the same type as well. All of that can happen outside the tree. It is, however, an argument for giving the caller the ability to allocate and manage nodes himself and passing them in pre-formed for insertion.
> 
>> > > I do want an allocator instance in the tree, because then you can make very nice things such as a `IndexWithFreeList` allocator that doesn't need protection from concurrent writes.
>> > 
>> > 
>> > You lost me here
>> 
>> A static allocator either needs access to local state via a pointer on allocation, or it needs to have some form of protection against concurrent writes. This code below requires a mutex in `alloc`, because it is static.
>> 
>> 
> 
> Oh, sure. But again, not a strong argument for non-intrusiveness. In my original comment, I argued for having the ability to pass in an allocator instance into the destructor, or alternatively use a token of some sort when calling functions in a AllStatic allocator that identifies the tree.
> 
> But as I wrote, I think we don't have to choose. We can have an allocator instance as optional ctor argument, and let the tree ...

> @tstuefe Did you have any more feedback on this? Would like to get this pushed soon so I can publish the intrusive variant.

Sorry, I have not forgotten you, but am booked out with FOSDEM preparations. I can continue the review when I come back (end of next week), or the week after.

I would have liked to prod at the meat of the insertion/deletion logic more, since I remember that part being particularly fiddly. We should get this right, this needs close review.

But I don't want to hold you up, so if you feel you need to push now, go ahead. I would definitely like to look at the intrusive variant, since I expect I will use that one quite a bit.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22360#issuecomment-2619117723