[aarch64-port-dev ] Remove unnecessary memory barriers around CAS operations
D.Sturm
D.Sturm42 at gmail.com
Wed Apr 2 01:48:30 UTC 2014
Hey,
could you explain your thinking about why it is safe to remove the second
memory barrier? We are definitely in agreement on the first one (see
previous mails on the matter), but I don't see how we can remove the second
one.
It is my understanding that - using the vocabulary from
http://gee.cs.oswego.edu/dl/jmm/cookbook.html - a lda[x]r instruction is
equivalent to load;LoadStore;LoadLoad and the stl[x]r instruction is
equivalent to LoadStore;StoreStore;store. Consequently to get all necessary
ordering guarantees would mean inserting a StoreLoad barrier after the
store instruction (which is equivalent to an AnyAny barrier) - there are
other options, but in any case we need a StoreLoad barrier *somewhere* (and
putting it after writes than before reads seems more efficient in practce -
that's what x64 does in HotSpot at least I think).
To quote the cookbook: "The sequence: Store1; StoreLoad; Load2
ensures that Store1's data are made visible to other processors (i.e.,
flushed to main memory) before data accessed by Load2 and all subsequent
load instructions are loaded. StoreLoad barriers protect against a
subsequent load incorrectly using Store1's data value rather than that from
a more recent store to the same location performed by a different
processor."
while the A64 reference has to say the following about a store-release:
"A store-release will be observed by each observer after that observer
observes any loads or stores that appear in program order before the
store-release, *but says nothing about loads and stores appearing after the
store-release* (emphasis mine)"
So I don't think we can remove the AnyAny barrier after volatile writes and
since a CAS has to fulfil the same requirements as a volatile read and
write, it seems to me that the 2nd barrier is necessary.
-- Daniel
On 1 April 2014 18:28, Andrew Haley <aph at redhat.com> wrote:
> After much careful reading of the AArch64 spec and many runs of jcstress,
> I have decided that it is safe to remove the barriers around CAS.
>
> A more thorough reworking of memory barriers is on my list of things
> to do, but it needs some HotSpot changes from upstream that aren't yet
> in our code base.
>
> Andrew.
>
>
> # HG changeset patch
> # User aph
> # Date 1396369343 14400
> # Tue Apr 01 12:22:23 2014 -0400
> # Node ID 780ed75ea21a727949abcfe57ea5544f7a1ca22c
> # Parent e176eb39c5f53127fe18a7e528ca6bc6f0c23cea
> Remove unnecessary memory barriers around CAS operations
>
> diff -r e176eb39c5f5 -r 780ed75ea21a
> src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp
> --- a/src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp Mon Mar 31
> 10:20:26 2014 -0400
> +++ b/src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp Tue Apr 01
> 12:22:23 2014 -0400
> @@ -1584,9 +1584,8 @@
> Label retry_load, nope;
> // flush and load exclusive from the memory location
> // and fail if it is not what we expect
> - __ membar(__ AnyAny);
> __ bind(retry_load);
> - __ ldxrw(rscratch1, addr);
> + __ ldaxrw(rscratch1, addr);
> __ cmpw(rscratch1, cmpval);
> __ cset(rscratch1, Assembler::NE);
> __ br(Assembler::NE, nope);
> @@ -1603,9 +1602,8 @@
> Label retry_load, nope;
> // flush and load exclusive from the memory location
> // and fail if it is not what we expect
> - __ membar(__ AnyAny);
> __ bind(retry_load);
> - __ ldxr(rscratch1, addr);
> + __ ldaxr(rscratch1, addr);
> __ cmp(rscratch1, cmpval);
> __ cset(rscratch1, Assembler::NE);
> __ br(Assembler::NE, nope);
> diff -r e176eb39c5f5 -r 780ed75ea21a
> src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
> --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp Mon Mar 31
> 10:20:26 2014 -0400
> +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp Tue Apr 01
> 12:22:23 2014 -0400
> @@ -1817,13 +1817,11 @@
> bind(retry_load);
> // flush and load exclusive from the memory location
> // and fail if it is not what we expect
> - membar(AnyAny);
> - ldxr(tmp, addr);
> + ldaxr(tmp, addr);
> cmp(tmp, oldv);
> br(Assembler::NE, nope);
> // if we store+flush with no intervening write tmp wil be zero
> - stxr(tmp, newv, addr);
> - membar(AnyAny);
> + stlxr(tmp, newv, addr);
> cbzw(tmp, succeed);
> // retry so we only ever return after a load fails to compare
> // ensures we don't return a stale value after a failed write.
> @@ -1847,13 +1845,11 @@
> bind(retry_load);
> // flush and load exclusive from the memory location
> // and fail if it is not what we expect
> - membar(AnyAny);
> - ldxrw(tmp, addr);
> + ldaxrw(tmp, addr);
> cmp(tmp, oldv);
> br(Assembler::NE, nope);
> // if we store+flush with no intervening write tmp wil be zero
> - stxrw(tmp, newv, addr);
> - membar(AnyAny);
> + stlxrw(tmp, newv, addr);
> cbzw(tmp, succeed);
> // retry so we only ever return after a load fails to compare
> // ensures we don't return a stale value after a failed write.
>
More information about the aarch64-port-dev
mailing list