Primitive boolean array packing

Andrew Haley aph at
Mon Oct 8 15:19:26 UTC 2018

On 10/08/2018 12:16 PM, B. Blaser wrote:
> The current prototype was intended to scrupulously implement what the
> JVMS explicitly allows for baload/bastore instructions.
> But helper intrinsics might be good alternatives too, we could try both.

So I tried it with a rough C++/assembly language test, and it doesn't
make much sense to use this for default boolean arrays. The update
sequence on AArch64 is

        0: ldxrb w2, [x0];
        and w2, w2, w1
        stxrb w3, w2, [x0]
        cbnz w3, 0b

(combined with a bunch of uninteresting shifts and logical
operations.)  We have to load a byte in exclusive state, AND or OR it
as appropriate, then store it, still in exclusive state. That gets us
the atomicity we need.

10**9 random set operations on an 8 Mbit array take 0.550s; with a
boolean array the time is 0.150s. This is the local overhead of the
load/store exclusive.

> Note that the existing BitSet needs external sync (most probably on
> the whole set) when used in multi-threaded environments.
> With packed baload/bastore instructions or dedicated intrinsics, the
> sync would be on smaller 8-bit blocks or no sync at all on some arch
> as Roman mentioned (AArch64).

It might help. The question is how much we need large bit
arrays. They're certainly useful for things like Bloom filters.

Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

More information about the compiler-dev mailing list