RFR: 8300487: Store cardinality as a field in BitSet [v5]

Thu Jan 19 21:18:51 UTC 2023

On Wed, 18 Jan 2023 12:43:31 GMT, fabioromano1 <duke at openjdk.org> wrote:

>> The enanchment is useful for applications that make heavy use of BitSet objects as sets of integers, and therefore they need to make a lot of calls to cardinality() method, which actually require linear time in the number of words in use by the bit set.
>> This optimization reduces the cost of calling cardinality() to constant time, as it simply returns the value of the field, and it also try to make as little effort as possible to update the field, when needed.
>> 
>> Moreover, it has been implemented a new method for testing wheter a bit set includes another bit set (i.e., the set of true bits of the parameter is a subset of the true bits of the instance).
>
> fabioromano1 has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Added author and reverse the cicle order in includes(BitSet)

> On Thu, Jan 19, 2023 at 4:29 AM fabioromano1 ***@***.***> wrote: Like other reviewers, changing the performance tradeoffs in BitSet make me uncomfortable. 30 years of code has adapted to the current performance tradeoffs. Those users who really need O(1) cardinality() can fairly easily implement it in client code (and probably have). The spirit of BitSet is space efficiency, and adding another field negates that. cardinality() is O(N) but in practice should be very efficient due to linear traversal and (presumably) the use of hardware bitcount instructions for each word (fix that if not true). There may be a case for a BitSet rewrite (e.g. to support large sparse bitsets) once we have compact value types. Leaving to the client the responsibility of implementation isn't against the principle of information hiding and encapsulation?
> The client already has a pretty good implementation of cardinality(). One can see the effort the JVM engineers put into optimizing bitCounting arrays of longs. The proposed change is an example of something that happens a lot - something helps a particular use case while extracting a small but perpetual tax on all other users of the JDK, (here including future maintainers of BitSet). Adding a redundant field to BitSet increases the risk of not keeping the new field in sync. Are we sure we've tested that every mutating method correctly updates the new field? What about new methods added in future years?
> […](#)
> --- Just to show that we care about BitSet performance, here's a (start of a) change that I **would** make:
> --- a/src/java.base/share/classes/java/util/BitSet.java +++ b/src/java.base/share/classes/java/util/BitSet.java @@ -897,6 +897,8 @@ public class BitSet implements Cloneable, java.io.Serializable { * @SInCE 1.4 */ public int cardinality() { + final int wordsInUse = this.wordsInUse; + final long[] words = this.words; int sum = 0; for (int i = 0; i < wordsInUse; i++) sum += Long.bitCount(words[i]); I bet that provides a measurable performance improvement, at least with some current VMs. But probably warmed up C2 doesn't need this help.

The patch has passed all existing tests of the class. Anyway, I decided to do instead a subclass that implements this enhancement, in order to avoid creating the problems shown here.

-------------

PR: https://git.openjdk.org/jdk/pull/11837