[aarch64-port-dev ] Remove unnecessary memory barriers around CAS operations
Andrew Haley
aph at redhat.com
Wed Apr 2 11:00:15 UTC 2014
On 04/02/2014 02:48 AM, D.Sturm wrote:
> Hey,
> could you explain your thinking about why it is safe to remove the second
> memory barrier? We are definitely in agreement on the first one (see
> previous mails on the matter), but I don't see how we can remove the second
> one.
>
> It is my understanding that - using the vocabulary from
> http://gee.cs.oswego.edu/dl/jmm/cookbook.html - a lda[x]r instruction is
> equivalent to load;LoadStore;LoadLoad and the stl[x]r instruction is
> equivalent to LoadStore;StoreStore;store. Consequently to get all necessary
> ordering guarantees would mean inserting a StoreLoad barrier after the
> store instruction (which is equivalent to an AnyAny barrier) - there are
> other options, but in any case we need a StoreLoad barrier *somewhere* (and
> putting it after writes than before reads seems more efficient in practce -
> that's what x64 does in HotSpot at least I think).
AIUI the barrier after stl[x]r is only needed to prevent a following
load from moving before it.
As Hans Boehm put it,
"Could someone post a test case that they think should work, but that
doesn't work with the acquire/release implementation (without added
fences)? Clearly it does not work as a general purpose fence replacement,
e.g. when used on an object accessed by only one thread. But I hope that
was not intended. It does seem to me that it does preserve the property
that properly synchronized programs are sequentially consistent."
"On ARMv8, I would expect a volatile store to be compiled to a store
release, and a volatile load to be compiled to a load acquire. Period.
Unlike on Itanium, a release store is ordered with respect to a later
acquire load, so the fence between them should not be needed. Thus there
is no a priori reason to expect that a CAS would require a fence either.
"I think that's completely uncontroversial. ARMv8 load acquire and store
release are believed to suffice for Java volatile loads and stores
respectively. Even the fence-less implementation used a release store
exclusive. Unless I'm missing something, examples like this should be
handled correctly by all proposed implementations, whether or not fences
are added.
"As far as I can tell, the only use case that require the fences to be added
are essentially abuses of CAS as a fence."
If you think otherwise, please join the discussion at
concurrency-interest at cs.oswego.edu:Semantics of compareAndSwapX.
Andrew.
More information about the aarch64-port-dev
mailing list