RFR: 8300487: Store cardinality as a field in BitSet [v5]

John R Rose jrose at openjdk.org
Thu Jan 19 23:16:33 UTC 2023


On Wed, 18 Jan 2023 12:43:31 GMT, fabioromano1 <duke at openjdk.org> wrote:

>> The enanchment is useful for applications that make heavy use of BitSet objects as sets of integers, and therefore they need to make a lot of calls to cardinality() method, which actually require linear time in the number of words in use by the bit set.
>> This optimization reduces the cost of calling cardinality() to constant time, as it simply returns the value of the field, and it also try to make as little effort as possible to update the field, when needed.
>> 
>> Moreover, it has been implemented a new method for testing wheter a bit set includes another bit set (i.e., the set of true bits of the parameter is a subset of the true bits of the instance).
>
> fabioromano1 has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Added author and reverse the cicle order in includes(BitSet)

I'll pile on:  This optimization doesn't buy much in today's world, where most machines execute `bitCount` in one cycle.  It saves a trivial loop.  Over very large bitsets that saves something, but most bitsets are likely to be medium to small (across all use cases).  And for large bitsets, if there is a problem with cardinality testing, the user should provide for it separately.  Perhaps as a class which contains a bitset, plus the extra cached information.  You might also cache the index of the first and last words containing a set bit.  There's no end to what one might cache:  It's data, and you can derive information from it.  That's a job for users, not for the library.

Also, as folks have noted, adding a speedup in one place pushes slowdowns in other places.  Worst of all, there are new race conditions in the "improved" data structure.

I don't recommend this change.  If the data structure were immutable, I might think otherwise (as String caches its hashcode), but the race conditions really spoil it for me.

-------------

PR: https://git.openjdk.org/jdk/pull/11837


More information about the core-libs-dev mailing list