Primitive boolean array packing

Sun Oct 7 12:38:21 UTC 2018

On Sun, 7 Oct 2018 at 13:51, Roman Kennke <rkennke at redhat.com> wrote:
>
> >>> I didn't search much if such experiments have already been
> >>> accomplished, but I'd like to take the temperature of this feature as
> >>> completing the implementation represents a significant amount of work.
> >>> Is this something that is worth exploring?
> >>
> >> The most problematic part, in my mind, is that JMM requires the absence of word tearing for array
> >> element accesses. See here: https://shipilev.net/blog/2014/jmm-pragmatics/#_part_ii_word_tearing
> >>
> >> With densely packed boolean[], you'd need to convert plain stores to locked/CAS-loops to support
> >> this, I think, which raises all sorts of performance questions. Choosing between boolean[] and
> >> BitSet is basically choosing between stricter/relaxed concurrency guarantees vs dense/sparse footprint.
> >
> > But you get a lot of performance conflicts between cores having to share
> > cache lines anyway. Maybe we should do some performance experiments: we
> > wouldn't need to do all of the C2 work, just write a little C++ and asm
> > code and measure what happens under some plausible conditions. We'd have
> > to try both contended and uncontended situations.
>
> Are boolean arrays even common enough to make a difference?
>
> Roman

Thanks Aleksey, you're absolutely right but a programmer can still
disable this feature and use regular boolean arrays if necessary.

Nevertheless, if the memory consumption is a priority and boolean
packing a necessity, the problem you mentioned has two solutions:
1) either the JVM could lock distinct boolean elements per 8-bit block
to preserve synchronized data, in which case some profiling would be
necessary as Andrew suggested
2) or the JVM could no more guarantee concurrent access on distinct
boolean array elements if packing is enabled delegating the
synchronization to the programmer.

While the first solution is safer, the second one minimizes the
additional performance cost with a better synchronization focusing
which maybe addresses Roman's question...
Bernard