Bit set intrinsic

B. Blaser bsrbnd at
Wed Oct 31 14:51:34 UTC 2018

On Sun, 21 Oct 2018 at 11:38, Andrew Haley <aph at> wrote:
> On 10/19/2018 04:42 PM, B. Blaser wrote:
> > I tried example [1] with both synchronized BitSet and dedicated
> > intrinsic on x86_64 (8 cores) which revealed that the intrinsic would
> > be at least 2-3x faster than the synchronized set.
> Which makes it very useful for things like highly-concurrent Bloom
> Filters. Here's an example of current usage:
> If we can provide a really fast on/off-heap bitset in standard Java
> the need for this Unsafe hackery will go away.
> It's interesting that the Cassandra Bloom Filter is not thread safe; I
> don't know why this is.

The last but not least, I implemented the c2 part (using the 8-bit
AND/OR variant) to do sharper comparisons also on non-concurrent

With 10e6 iterations the lock latency seems to be more or less
negligible and removing it would make the intrinsic about 10% faster
than BitSet without synchronization.

I used it naively inside RegularEnumSet without optimization nor
atomicity guarantee only to evaluate its robustness and tier1 was

I'm not sure what would be the process for adding something like this
to the JDK (maybe a JEP would be necessary?) but I cannot implement
the intrinsic on other architectures myself.
Also, I guess a more precise API specification along with advanced
profiling would be necessary but I'm probably not an expert in these

In any case, I hope these experiments will be useful for something.

Please let me know of any feedback,

More information about the compiler-dev mailing list