RFR: 8345314: Add a red–black tree as a utility data structure [v12]
Axel Boldt-Christmas
aboldtch at openjdk.org
Thu Jan 16 11:12:50 UTC 2025
On Fri, 10 Jan 2025 10:03:22 GMT, Casper Norrbin <cnorrbin at openjdk.org> wrote:
>> Hi everyone,
>>
>> This effort began as an exploration of replacing the current NMT treap with a red-black tree. Along the way, I discovered that others were also interested in having a general-purpose tree structure available within HotSpot.
>>
>> The red-black tree is designed to serve as a drop-in replacement for the existing NMT treap, keeping a nearly identical interface. However, I’ve also added a few additional requested features, such as an iterator.
>>
>> Testing builds off the treap tests, adding a few extra that inserts/removes and checks that the tree is correct. Testing uses the function `verify_self`, which iterates over the tree and checks that all red-black tree properties hold. Additionally, the tree has been tested in vmatree instead of the treap without any errors.
>>
>> For those who may want to revisit the fundamentals of red-black trees, [Wikipedia](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) offers a great summary with tables covering the various balancing cases. Alternatively, your favorite data structure book could provide even more insight.
>
> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision:
>
> clarified comment
Good work on the RB-tree implementation.
I am no expert in RB-trees but the implementation does not look obviously incorrect. And the gtest and verification looks comprehensive. (Only thing I might have extended is to randomness to also check the visitor and iterators. Something like generating a random ordered set of keys in a different container, insert them in some random order, maybe including insertions and removals of other keys and then check that the visitor and iterator visit this set correctly.)
The RB-tree logic looks good to me.
Except for some small nits most of my feedback is about the API and the iteration.
I would like to avoid GrowableArrays in the implementation, but maybe I am missing a reason why they are preferable. If we could design the iterators to work more like standard C++ iterators (at least simplified but compatible versions) would make them more useful and ergonomic. There is also an opportunity to use such iterators as cursors and create a nice interface for using this tree in many different scenarios.
The second is if we should add more const version of the APIs to make it easier to work with the tree in const context (e.g. pass around the tree as a `const& Tree`). At least `const V* find(const K& key) const` seems useful to have a const `contains`.
Overall I think this is nice. But I would like some feedback on my comments w.r.t. the API before approving.
src/hotspot/share/utilities/rbTree.hpp line 119:
> 117: _allocator.free(node);
> 118: _num_nodes--;
> 119: }
Are we assuming that the key is always trivially destructible? Maybe we should document and statically assert this. Even if it is not, it is usually nice to call the destructor to end the objects lifetime.
src/hotspot/share/utilities/rbTree.hpp line 167:
> 165: // Removes the given node from the tree.
> 166: // Returns true if the node was successfully removed, false otherwise.
> 167: bool remove(RBNode* node);
I find it confusing that removing a node can fail. It seems to be to handle `nullptr`. I'd rather restrict this and have `bool remove(const K& k)` check the return value of `find_node`.
src/hotspot/share/utilities/rbTree.hpp line 183:
> 181: _num_nodes = 0;
> 182: _root = nullptr;
> 183: }
I feel like this could have been done recursively and using the stack as a stack. Or at least skip the heap allocation with fixed sized array `RBNode* stack[128];`.
src/hotspot/share/utilities/rbTree.hpp line 236:
> 234: RBNode* end = closest_gt(addr);
> 235: return Range(start, end);
> 236: }
This seems somewhat out of place. The signature is miss-matched from all the other APIs both in terms of the name and type.
I am wrong in that end is always the next node after start? I think the closest_gt call is unnecessary and we can simply find the next node after start.
src/hotspot/share/utilities/rbTree.hpp line 273:
> 271: NONCOPYABLE(IteratorImpl);
> 272:
> 273: IteratorImpl(const RBTree* tree) : _tree(tree) {
Given that our tree node have a parent pointer. I wonder why this is implemented with a GrowableArray rather than just keeping track of a node and walking the tree on next / previous. (Are we afraid that the pointer chasing back up the tree will be to costly? It seems very rare that you walk up the tree without also dereferencing the data)
This would have multiple benefits. We could make it O(1) space rather than O(log(n)).
I think the biggest benefit is that it can be copy constructible and assignable making it much easier to work with including making it easily adaptable to work like a C++ iterator so that we can have begin() and end(), ranged for loops. Many of the APIs like find, enclosing range, visit etc could be implemented in terms of these, making it less cumbersome to design external algorithms which interacts with the tree rather than having to extend the functionality on the tree directly. It also makes iterators usable while inserting in the tree.
Johan talk about using a cursor to enable inserts and removals (to for example enable an intrusive node). I think we could also design an API which achieves the same with iterators.
```c++
auto it = tree.find_leq(key);
if (cmp(key, *it) != 0) {
Node* node = get_intrusive_node(key);
tree.insert(it, node);
} else {
Node* node = *it;
// Update node inplace or maybe replace it with a new intrusive node.
}
src/hotspot/share/utilities/rbTree.inline.hpp line 444:
> 442: template <typename K, typename V, typename COMPARATOR, typename ALLOCATOR>
> 443: template <typename F>
> 444: inline void RBTree<K, V, COMPARATOR, ALLOCATOR>::visit_range_in_order(const K& from, const K& to, F f) {
This could be written in terms of `closest_leq` and `closest_gt` and just walk the tree between these nodes, and only do `O(log(n))` `CMP` calls instead of `O(log(n) + log(k))` where `k` is the size of the node range.
src/hotspot/share/utilities/rbTree.inline.hpp line 525:
> 523: }
> 524:
> 525: #endif // SHARE_UTILITIES_RBTREE_INLINE_HPP
Missing new line
Suggestion:
#endif // SHARE_UTILITIES_RBTREE_INLINE_HPP
test/hotspot/gtest/utilities/test_rbtree.cpp line 68:
> 66: public:
> 67: void inserting_duplicates_results_in_one_value() {
> 68: constexpr const int up_to = 10;
`constexpr const` looks strange to me.
Suggestion:
constexpr int up_to = 10;
test/hotspot/gtest/utilities/test_rbtree.cpp line 111:
> 109: };
> 110:
> 111: constexpr const int up_to = 10;
Suggestion:
constexpr int up_to = 10;
test/hotspot/gtest/utilities/test_rbtree.cpp line 345:
> 343:
> 344: void test_iterator() {
> 345: constexpr const int num_nodes = 100;
Suggestion:
constexpr int num_nodes = 100;
test/hotspot/gtest/utilities/test_rbtree.cpp line 458:
> 456:
> 457: TEST_VM_F(RBTreeTest, InsertRemoveVerify) {
> 458: constexpr const int num_nodes = 100;
Suggestion:
constexpr int num_nodes = 100;
test/hotspot/gtest/utilities/test_rbtree.cpp line 476:
> 474: { // Repeatedly verify a tree of moderate size
> 475: RBTreeInt rbtree;
> 476: constexpr const int ten_thousand = 10000;
Suggestion:
constexpr int ten_thousand = 10000;
test/hotspot/gtest/utilities/test_rbtree.cpp line 503:
> 501: struct Nothing {};
> 502: RBTreeCHeap<int, Nothing, Cmp, mtOther> rbtree;
> 503: constexpr const int one_hundred_thousand = 100000;
Suggestion:
constexpr int one_hundred_thousand = 100000;
-------------
PR Review: https://git.openjdk.org/jdk/pull/22360#pullrequestreview-2554941459
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1917844013
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1917848967
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1917858528
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1917952102
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1917889019
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1918205042
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1917912263
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1918207402
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1918207817
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1918208223
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1918208506
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1918208806
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1918209157
More information about the hotspot-dev
mailing list