BitMap vs VectorSet?

John Rose john.r.rose at oracle.com
Thu Oct 27 18:07:08 UTC 2022


This is a good question, and a reasonable time to engage with it as well.

The code in src/hotspot/share/libadt is very old (~30 years) and has not been vigorously maintained, relative to changes in C++ practice.

BitMap is newer and its code is more modern and actively maintained.  (Thanks Kim B. et al.)  In my opinion, it deserves to be used instead of VectorSet.  Having two classes for the same set of tasks is technical debt.  It is the sort of thing that naturally happens on a large project when two groups settle on different solutions for a common problem.  And the sort of thing a large project has to clean up over time.

So, I support taking steps (reasonable, gradual, limited-impact steps) to retire the classes in libadt, replacing them with more modern ones, in src/hotspot/share/utilities.

As you note, Xin, this is somewhat more urgent as we increase our gtest coverage, since we don’t want to develop two sets of new tests for the same application area.

It’s also more urgent as hardware technology advances:  If we choose to work harder to vectorize bitset operations, we don’t want to do that work twice.

It’s also more urgent as software technology advances.  I’m reminded that there are many new algorithms for bit-twiddling (the name Daniel Lemire comes to mind, for one) and (again: get the pattern?) we don’t want to upgrade the algorithms twice.

People working on HotSpot routinely deal with extremely challenging software problems, specific to optimization, garbage collection, concurrency, platform support, performance, and more.  It’s hard to always, also, select well-tested, well-designed shared libraries to help with these problems; sometimes you just have to roll your own.  But while understandable, that tends to create technical debt.  That sometimes leads to failures (unscheduled payments on technical debt) due to lack of test coverage, simply because some work somewhere didn’t use common code but “rolled their own” algorithms.

So, yes, let’s double down on BitSet and retire VectorSet.

(And Dict, while we are at it: There are some really nice hash table classes developed recently by the runtime group; surely we are not too far from merging the job of Dict into that group of classes?)

— John

On 27 Oct 2022, at 0:07, Liu, Xin wrote:

> hi,
>
> Recently, I extended VectorSet(libadt/vectset.hpp) to support
> intersection. Then I found that there is yet another "vector set" called
> 'BitMap'. From my reading, the data structure of BitMap is same as
> VectorSet. BitMap has already supported intersection/union/difference
> and it even owns corresponding gtests.
>
> Given that, why does HotSpot still need to maintain both VectorSet and
> BitMap? I grep VectorSet and there are still many references. Is that
> too troublesome to replace VectorSet with BitMap or there are other
> reasons?
>
> I modified VectorSet because Unique_Node_List(node.hpp) uses it. I
> hesitate now. Should I invest a new gtest for the new code or I just use
> BitMap instead?
>
> thanks,
> --lx


More information about the hotspot-runtime-dev mailing list