Memory ordering properties of Atomic::r-m-w operations

Tue Nov 8 10:35:17 UTC 2016

On 8/11/2016 8:18 PM, Andrew Haley wrote:
> On 08/11/16 01:11, David Holmes wrote:
>> On 6/11/2016 8:54 PM, Andrew Haley wrote:
>>> On 05/11/16 18:43, David Holmes wrote:
>>>> Forking new discussion from:
>>>>
>>>> RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
>>>>
>>>> On 1/11/2016 7:44 PM, Andrew Haley wrote:
>>>>> On 31/10/16 21:30, David Holmes wrote:
>>
>>>  if you have
>>>
>>> store_relaxed(a)
>>> load_seq_cst(b)
>>> store_seq_cst(c)
>>> load_relaxed(d)
>>>
>>> there's nothing to prevent
>>>
>>> load_seq_cst(b)
>>> load_relaxed(d)
>>> store_relaxed(a)
>>> store_seq_cst(c)
>>>
>>> It is true that neither store a nor load d have moved across this
>>> operation, but they have exchanged places.  As far as GCC is concerned
>>> this is a correct implementation, and it does meet the requirement of
>>> sequential consistency as defined in the C++ memory model.
>>
>> It does? Then it emphasises what I just said about not knowing what it
>> means to implement an operation with seq_cst semantics.
>
> I take your point, but seq_cst is not a real mystery, it's just a
> matter of looking it up: it's all defined in the C++11 standard.  And
> it's not significantly different from Java volatile.

I have looked at it of course, but still find it rather "mysterious".

>> I would have expected full ordering of all loads and stores to get
>> "sequential consistency".
>
> Why?  There are only two sequentially-consistent loads and stores in
> that block of code.  Of course those two have a total order.  But you
> surely wouldn't expect a sequentially-consistent store to be ordered
> with respect to a relaxed load.

I guess I think of sequentially consistent as a global property of a 
system, not relative to just atomic operations.

>>> Ouch.  Yes, I agree that something needs fixing.  That comment:
>>>
>>> // Use release_store_fence to update values like the thread state,
>>> // where we don't want the current thread to continue until all our
>>> // prior memory accesses (including the new thread state) are visible
>>> // to other threads.
>>>
>>> ... seems very unhelpful, at least because a release fence (using
>>> conventional terminology) does not have that property: a release
>>> fence is only LoadStore|StoreStore.
>>
>> In release_store_fence the release and fence are distinct memory
>> ordering components. It is not a store combined with a "release
>> fence" but a store between a "release" and a "fence". And critically
>> in hotspot that "fence" must have visibility guarantees to ensure
>> correctness of Dekker-duality algorithms.
>
> Ah, that is a slightly misleading name.  The "_fence" at the end of
> the name is really a StoreLoad fence, got it.  I noticed that once
> before, but I'd forgotten.  I guess what's intended here is a
> sequentially-consistent store.

It is intended to be:

release(); store; fence();

but might be implementable in a more efficient manner when combined in a 
single function.

I have a problem with referring to a "storeload fence". storeload is one 
form of memory barrier - a full fence represents all four forms to me. 
Terminology is a disaster in this field unfortunately - one 
architectures barrier is anothers fence. :(

>> Note the equivalence of release() with LoadStore|StoreStore is a
>> definition within orderAccess.hpp, it is not a general equivalence.
>
> OK.  It would certainly be nice if HotSpot could move to using
> standard terminology.  Then, in time, we could just use the C++11
> atomics.

The stand-alone (unbound) release() and acquire() are defined as they 
are to allow them to be associated with a subsequent store, or previous 
load, in cases where we can not access the variable directly to apply a 
release_store, or load_acquire operation. This is somewhat independent 
of the atomic API.

David
-----

> Andrew.
>