RFR: 8233073: Make BitMap accessors more memory ordering friendly

Tue Oct 29 04:19:40 UTC 2019

CDS also uses BitMap -- always in single threaded code. So as long as 
single-threaded code doesn't get slowed down by this patch, I have no 
objection.

Thanks
- Ioi

On 10/28/19 6:54 PM, Kim Barrett wrote:
> Should this review be happening on hotspot-dev rather than
> hotspot-gc-dev?  GC is not the only BitMap client; compiler uses them
> too (and generally rather differently).
>
>> On Oct 28, 2019, at 11:29 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>>
>> Hi,
>>
>> I have need for accessors on BitMap being more explicitly memory ordering aware, to fix a bug in ZGC for AArch64 (https://bugs.openjdk.java.net/browse/JDK-8233061).
>> In particular, I need failed bit sets to still have acquire semantics, and I need a getter with acquire semantics.
>>
>> My intention is to solve the problem by making the relevant BitMap accessors accept explicitly passed in memory ordering parameters, and utilize them. I draw the line of conservativeness at supporting IRIW-consistent loads. Having spent a great deal of time finding a single algorithm that breaks due to IRIW-consistency violations, and knowing the complexity of algorithms that actually can break due to that, I would be *very* surprised if we got anywhere close to that. Therefore, acquiring loads are the most conservative loads I support. This is explicitly stated in the API, so that anyone that actually relies on IRIW consistency in the future can reconsider that, and add a mode that fences before loads on nMCA machines.
>>
>> The main points of controversy with this patch, where I expect people to have wildly different opinions and hopefully get at least a little bit upset are the following:
>>
>> 1) For the same reason that our implementation of Atomic::cmpxchg does not supply both one ordering for success and one for failed CAS, unlike the C++11 atomic counter part, I do not do so either in the par_set_bit API. In the Atomic API, this was very much intentional, because it is tricky to reason about the subtle effects of having relaxed failed CAS and conservative success. In fact, it's a bug of precisely that nature I am chasing. Therefore, I wish to transfer that same reasoning to the par_set_bit API, and not allow passing in a weaker failing memory ordering. A consequence of this is that I have made the uses of this API more conservative for failed bit flips than it was in the past. However, this new API allows relaxing the real pain point of the API: the success case (with it's bi-directional full fencing semantics). So I expect it can be applied to make RMO architectures happier where it really matters in the end. However, I will not attempt to prove that relaxing these calls is okay in various places with this patch: that is outside of my scope, I'm merely adding API hooks for allowing that.
>>
>> 2) The default strength on the getter is memory_order_relaxed and not memory_order_conservative. After looking at uses, I realize it's used mostly in single threaded contexts by compiler code, and there is seemingly only a single use in the VM that cares about having acquire (ZGC, and it's broken today). While letting the frequency of uses decide what is the default rather than what is safest is not something I would normally do, it does feel like since the norm is so vastly in favour of the relaxed variant, I don't want to let the one ZGC use case clutter half the VM with explicitly relaxing the load. I am okay with reverting that decision if people want me to.
>>
>> CR:
>> http://cr.openjdk.java.net/~eosterlund/8233073/webrev.00/
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8233073
>>
>> Thanks,
>> /Erik
> ------------------------------------------------------------------------------
> src/hotspot/share/utilities/bitMap.hpp
> 207   bool at(idx_t index, atomic_memory_order memory_order = memory_order_relaxed) const;
>
> My initial reaction here is that I'd prefer adding par_at() rather
> than giving at() a memory order argument.  This would also address the
> question of what the default should be.  For at(), it's nonatomic.
> For par_at() it's acquire.
>
> That would also avoid imposing volatile ordering on at().
>
> As you said, existing uses of at() are relaxed/nonatomic.  The code
> rearrangement for MarkBitMap::is_marked() makes me wonder if any of
> the calls should be acquire ordered, but obviously none are now...
>
> ------------------------------------------------------------------------------
> src/hotspot/share/utilities/bitMap.hpp
> 205   // The memory ordering goes up to memory_order_acquire, but not higher. It is
> 206   // assumed that users of the BitMap API will never rely on IRIW consistency.
>
> I think what this means is that memory_order_seq_cst
> (memory_order_conservative in HotSpot) isn't supported? So just as we
> only have Atomic::load (relaxed) and Atomic::load_acquire (acquire).
> That seems okay. But if we aren't going to support the stronger
> semantics, I don't think this should permit the corresponding memory
> order value.
>
> C++11 specifies that atomic load operations cannot have a memory order
> of memory_order_release or memory_order_acq_rel. (Similarly, store
> operations cannot have a memory order of memory_order_consume,
> memory_order_acquire, or memory_order_acq_rel. That isn't relevant for
> this change, as all the modifying operations here are RMW.)
>
> So I think we should just be explicit that only relaxed and acquire
> are supported here.  (And actually make that true; see next comment.)
>
> ------------------------------------------------------------------------------
> src/hotspot/share/utilities/bitMap.inline.hpp
>   55 inline bool BitMap::at(idx_t index, atomic_memory_order memory_order) const {
> ...
>   58   return (load_word_ordered(addr, memory_order) & bit_mask(index)) != 0;
>
> This is using the load_word_ordered helper, but the behavior of that
> function is designed to support the RMW operations, and I think isn't
> really right for at() (see previous comment). The simplest solution to
> get what I'm suggesting would be to add an assert here that the
> memory_order is either relaxed or acquire.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/markBitMap.inline.hpp
> 71   return _bm.at(addr_to_offset(addr), memory_order_relaxed);
>
> The memory order argument isn't needed with the current default, and
> wouldn't even be permitted with the above suggestion add par_at.
>
> ------------------------------------------------------------------------------
>
> I'm not understanding part of the problem description though.  You say
>
>    ... have made the uses of this API more conservative for failed bit
>    flips than it was in the past.
>
> But the pre-existing unordered cases in the setting functions (e.g.
> don't go through cmpxchg) are those where the bit is already set to
> the desired value, so there's no failure to change the bit involved.
> It seems reasonable to me that an acquire (at least) is usually
> desirable on that path, for reasons similar to why one wants an
> acquire on the outside-the-lock test when using the Double Checked
> Locking pattern. But that's not what was said, so I'm not sure I'm
> understanding the point.
>
> ------------------------------------------------------------------------------
>
> The par_xxx_range operations are not being directly modified by this
> change. When only 1 bit is actually involved, they'll delegate to the
> conservative single-bit operations, so are changed to pick up the
> acquire on the already set to the desired value path. Otherwise, they
> always go through conservative cmpxchg as before.  That all seems fine.
>
> ------------------------------------------------------------------------------
>