RFR: 8345314: Add a red–black tree as a utility data structure [v12]

Sun Jan 12 08:48:44 UTC 2025

On Fri, 10 Jan 2025 10:03:22 GMT, Casper Norrbin <cnorrbin at openjdk.org> wrote:

>> Hi everyone,
>> 
>> This effort began as an exploration of replacing the current NMT treap with a red-black tree. Along the way, I discovered that others were also interested in having a general-purpose tree structure available within HotSpot.
>> 
>> The red-black tree is designed to serve as a drop-in replacement for the existing NMT treap, keeping a nearly identical interface. However, I’ve also added a few additional requested features, such as an iterator.
>> 
>> Testing builds off the treap tests, adding a few extra that inserts/removes and checks that the tree is correct. Testing uses the function `verify_self`, which iterates over the tree and checks that all red-black tree properties hold. Additionally, the tree has been tested in vmatree instead of the treap without any errors.
>> 
>> For those who may want to revisit the fundamentals of red-black trees, [Wikipedia](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) offers a great summary with tables covering the various balancing cases. Alternatively, your favorite data structure book could provide even more insight.
>
> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   clarified comment

I am playing around with this tree to replace the backing storage of the compilation memory statistics. Some notes:

Why does the tree _own_ the allocator? Why is its lifetime bound to the tree? This restricts the usefulness of an allocator. Two trees have two different allocator instances, but since we have no way of telling apart the caller tree instances in the allocator code, there is no point in that. We might just as well make the allocator AllStatic then and call via `ALLOCATOR::allocate`.

Consider this scenario: I want to have two short-lived trees, nodes live longer than their trees they temporarily occupy, and I want the nodes to reside in physically different preallocated pools. I want to use the same allocator and tree classes, though, so template parameters must be the same.

There are two ways to do this. The most obvious would be to keep a _reference_ to an allocator instance that the caller hands in via constructor. So, make `_allocator` a pointer, and if omitted by the caller use your own C-heap allocator. Otherwise, use the caller's allocator.

Another way would be to make the Allocator AllStatic and give each tree a caller-defined token (e.g. void*, think pthread_create), then hand this into all allocator functions. Tbh, the first variant sounds easier.

The node tree must also handle OOM because maybe I don't want to exit on OOM. An OOM from the node allocator itself may be a normal and expected behavior. 

Consider this example: I want to use nodes from a limited pre-allocated buffer, and I want to handle buffer OOM by, e.g., deleting the leftmost nodes and retrying. Maybe I want to keep the N largest "somethings" during a VM run within a limited buffer. In that case, an allocator OOM would be the most natural and understandable way to implement this. As the tree works now, we would need to predict and forestall an OOM by counting nodes *before* adding a new one. That is awkward and maybe not even possible if the allocator OOM is not easily predictable (think variable-sized nodes, or allocator fragmentation issues).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22360#issuecomment-2585643391