Bit set intrinsic (was: Primitive boolean array packing)

B. Blaser bsrbnd at gmail.com
Fri Oct 19 15:42:59 UTC 2018


On Mon, 8 Oct 2018 at 10:53, Andrew Haley <aph at redhat.com> wrote:
>
> On 10/07/2018 08:54 PM, Aleksey Shipilev wrote:
>
> > Of course, if you do packed boolean[] with locked/CAS-ed writes or
> > otherwise guarantee the absence of word tearing, it does not break
> > the spec.
>
> Exactly. it's an interesting idea for large bit vectors, and it could
> well generate good code. However, it probably makes more sense to
> implement it as a bit vector class with suitable helper intrinsics.

So, I turned the existing prototype into a bit set intrinsic
(System.setBit()) as I agree with Andrew that it would make more sense
than modifying baload/bastore:

http://cr.openjdk.java.net/~bsrbnd/boolpack/webrev.01/

The current x86_64 implementation uses BTS/BTR (bit set and reset)
instructions as Roman suggested with the LOCK prefix to assure
atomicity and thus avoiding any word-tearing problem that Aleksey
mentioned. Note that BTS/BTR are locking 32-bit words but AND/OR
(8-bit) could be used instead, also in conjunction with LOCK.

I tried example [1] with both synchronized BitSet and dedicated
intrinsic on x86_64 (8 cores) which revealed that the intrinsic would
be at least 2-3x faster than the synchronized set.

Then, to the question of how much do we need this, BitSet usages in
the JDK is a beginning of answer:

$ find ./src/ -name '*.java' -type f -print | xargs grep -e "BitSet"
-e "BitArray"

There are also other places like javac flags [2] or EnumSet [3] where
the same thing is remade by hand. Finally, note that using the
intrinsic inside EnumSet would probably make it 2-3x faster than using
a SynchronizedCollection [4] in multi-threaded environments.

While I agree with Maurizio that Panama structs might be well suited
in some situations requiring a rigid layout like javac flags, I think
they are only complementary to bit arrays but not a replacement (see
EnumSet, etc...).

I've seen that Panama is also about exploring hardware optimizations,
so maybe the latest intrinsic prototype could be a good starting point
for this (hotspot:tier1 being OK)?

In any case, feedbacks about this intrinsic would be welcome.

Thanks,
Bernard

[1] https://docs.oracle.com/javase/specs/jls/se11/html/jls-17.html#jls-17.6
[2] http://mail.openjdk.java.net/pipermail/compiler-dev/2018-September/012504.html
[3] http://hg.openjdk.java.net/jdk/jdk/file/cb94f3a51aed/src/java.base/share/classes/java/util/JumboEnumSet.java#l44
[4] http://hg.openjdk.java.net/jdk/jdk/file/cb94f3a51aed/src/java.base/share/classes/java/util/Collections.java#l1998


More information about the compiler-dev mailing list