From dgpickett at aol.com Sat Mar 16 19:18:28 2024 From: dgpickett at aol.com (David G. Pickett) Date: Sat, 16 Mar 2024 19:18:28 +0000 (UTC) Subject: Improving on PriorityQueue and sort speeds generally with my enhanced binomial heap In-Reply-To: <292889911.4407374.1708533622507@mail.yahoo.com> References: <292889911.4407374.1708533622507.ref@mail.yahoo.com> <292889911.4407374.1708533622507@mail.yahoo.com> Message-ID: <73953971.4172654.1710616708527@mail.yahoo.com> Further research on the C++ gcc g++ side shows they get even better performance in their open source priority_queue.hpp using a pairing heap with vector as supporting list. On Wednesday, February 21, 2024 at 11:40:22 AM EST, David G. Pickett wrote: I discovered the binomial heap when trying various heaps and optimizing their code.? It has advantages over an array based priority queue and most other sorts: - Because it has very high locality of reference, on modern cached and VM systems, it is far faster.? As items are inserted, on collision with an equal rank tree, the loser tree is made a child and not referenced until the parent is pop'd.? The size of the root forest is on average 1/2 log2(n), one tree for each one bit in N, so 10 trees for N=1023 and 1 for N=1024.? Both insert and top deal only with the root nodes of the trees in the root forest.? The children just ride along at no cost until their parent is pop'd.. - Because it minimizes insert cost O(1), it is faster for many problems where the heap is never emptied before being discarded. - It is very fast at a meld of two heaps, just O(log2(N)) of the smaller heap. - It is duplicate key tolerant.? Duplicate keys will top and pop adjacently.? I suppose if one wants them eliminated, as in a Set, this is a disadvantage and an inconvenience, as you have to store duplicates and eliminate them in top or pop.? It is still a valid space-speed tradeoff. The costs and disadvantages of my heap are: - The heap space is higher than an array, 17-24 N bytes additional, but a valid space-speed tradeoff. - As a sort, it is not a stable sort for duplicate keys, although it is duplicate key tolerant.? You can add a hidden insert sequence long secondary key for stable sort, raising the heap space to 25-32 bytes.? Because of the locality of reference, this is still a great space-speed tradeoff. - Copy constructors need to copy all N nodes, a more complex task than coping an array. - It's a lot of work to find and upgrade all the slower sorts in the libraries to a new stable sort! Compared to example binomial heaps, I simplify and speed my heap: - I do not attempt to track the min or max during insert, deferring it to top time for lower insert cost. - I aggressively merge each insert into a root forest that is a unique rank sorted forward linked list of trees, for maximum locality of reference and fastest insert, top.? The insert is still O(1). - For top and pop, I force merge all trees in the root forest regardless of rank, creating trees of rank 1 + max(rank1, rank2).? This narrows the current and future width of the root list by pushing many trees far down below the root.? Example binomial heaps just search for min or max, which costs about the same as merge but saves no future sort cost.? They also maintain the min/max during insert, which is completely a waste if the next operation is not top or pop. The attached code is not a full featured replacement for PriorityQueue, but was sufficient to examine the relative performance as a sort.? I also include text classes to illustrate the relative speeds. -------------- next part -------------- An HTML attachment was scrubbed... URL: