RFR: 8345314: Add a red–black tree as a utility data structure

Johan Sjölen jsjolen at openjdk.org
Mon Dec 2 15:46:17 UTC 2024


On Mon, 25 Nov 2024 13:11:52 GMT, Casper Norrbin <cnorrbin at openjdk.org> wrote:

> Hi everyone,
> 
> This effort began as an exploration of replacing the current NMT treap with a red-black tree. Along the way, I discovered that others were also interested in having a general-purpose tree structure available within HotSpot.
> 
> The red-black tree is designed to serve as a drop-in replacement for the existing NMT treap, keeping a nearly identical interface. However, I’ve also added a few additional requested features, such as an iterator.
> 
> Testing builds off the treap tests, adding a few extra that inserts/removes and checks that the tree is correct. Testing uses the function `verify_self`, which iterates over the tree and checks att all red-black tree properties hold. Additionally, the tree has been tested in vmatree instead of the treap without any errors.
> 
> For those who may want to revisit the fundamentals of red-black trees, [Wikipedia](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) offers a great summary with tables covering the various balancing cases. Alternatively, your favorite data structure book could provide even more insight.

Initial review

src/hotspot/share/utilities/rbTree.hpp line 47:

> 45: 
> 46: private:
> 47:   enum Color { BLACK, RED };

`enum Color : uint8_t { BLACK, RED };`

Minimize to 1 byte

src/hotspot/share/utilities/rbTree.hpp line 60:

> 58:     RBNode* _left;
> 59:     RBNode* _right;
> 60:     unsigned int _black_height; // MSB encodes Red/Black, 0 - red, 1 - black

Suggestion: Use explicitly sized type `uint32_t`

src/hotspot/share/utilities/rbTree.hpp line 61:

> 59:     RBNode* _left;
> 60:     RBNode* _right;
> 61:     Color _color;

Move to the end, this avoids padding and might save us some memory.

src/hotspot/share/utilities/rbTree.hpp line 79:

> 77:     #endif // ASSERT
> 78:       _key = new_key;
> 79:     }

Seems unused? Kill it, it's too dangerous to be left alive!

src/hotspot/share/utilities/rbTree.hpp line 89:

> 87:       _black_height &= (-1U >> 1);
> 88:     }
> 89:     RBNode(const K& k, const V& v)

Space missing

src/hotspot/share/utilities/rbTree.hpp line 96:

> 94:       return _parent != nullptr && _parent->_right == this;
> 95:     }
> 96:     bool is_left_child() {

Space

src/hotspot/share/utilities/rbTree.hpp line 317:

> 315:   void fix_violations(RBNode* node) {
> 316:     if(node->is_black()) { // node's value was updated
> 317:       return;                    // Tree is already correct

Fix indentation of comment

src/hotspot/share/utilities/rbTree.hpp line 377:

> 375:   }
> 376: 
> 377:   void remove_inner(RBNode* node) {

Seems like this function is made to remove nodes whose color is black and whose removal would cause an imbalance. Maybe we can have a better name for this function or documentation, and also a few asserts?

src/hotspot/share/utilities/rbTree.hpp line 381:

> 379:     RBNode* parent = node->_parent;
> 380:     while (parent != nullptr) {
> 381:       RBNode* sibling = node->is_left_child() ? parent->_right : parent->_left;

I'm missing why this is guaranteed to not be null

src/hotspot/share/utilities/rbTree.hpp line 474:

> 472:     RBNode* parent = node->_parent;
> 473:     RBNode* left = node->_left;
> 474:     RBNode* right = node->_right;

We don't get that much usage out of these. If you make them references, or `RBNode*&`, then you can stop accessing `node->_left` etc and use these directly.

src/hotspot/share/utilities/rbTree.hpp line 475:

> 473:     RBNode* left = node->_left;
> 474:     RBNode* right = node->_right;
> 475:     if (left != nullptr) { // node has a left only-child

In each branch assert that the other child is missing. Document this assumption in a comment above the function (along with a rudimentary explanation). Also, add in the assertions regarding the nodes' color.

src/hotspot/share/utilities/rbTree.hpp line 482:

> 480:       node->_left->_parent = node->_parent;
> 481:       if (parent == nullptr) {
> 482:         _root = node->_left;

`assert` that node is root.

src/hotspot/share/utilities/rbTree.hpp line 710:

> 708:   }
> 709: 
> 710:   void insert(const K& k, const V& v) {

Why is `insert` needed on top of `upsert`?

src/hotspot/share/utilities/rbTree.hpp line 933:

> 931: 
> 932:     return true;
> 933:   }

I think these are *great* and I want them in. They might be difficult for some reviewers to accept, because they allow you to break stuff like node count. I know that I asked you to implement these, but I think it's best to remove them for now and add them back in a future PR. I'm sorry!!!

-------------

PR Review: https://git.openjdk.org/jdk/pull/22360#pullrequestreview-2458414366
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1856601465
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1865455956
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1856602354
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1856603838
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1865456938
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1865457281
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1865616079
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1865498056
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1865527034
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1865471302
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1865465101
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1865469976
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1856612496
PR Review Comment: https://git.openjdk.org/jdk/pull/22360#discussion_r1856616952


More information about the hotspot-dev mailing list