RFC: linux-aarch64 and LSE support
dean.long at oracle.com
dean.long at oracle.com
Fri Sep 23 11:40:03 UTC 2022
I have to admit it is a bit surprising and counter-intuitive to me to
not treat the CASAL as atomic. If the memory model allows the 2nd store
to be moved "inside" the CASAL, then that means it might be impossible
to provide a total order of the AArch64 instructions that reproduces the
observed memory effects, and that makes me wonder if it would break
tools like "rr".
dl
On 9/23/22 1:51 AM, Andrew Haley wrote:
> On 9/23/22 00:40, dean.long at oracle.com wrote:
> > Also, since "Barrier-ordered-before" describes when RW1 "happened
> > before" RW2 (or RW2 "happened after" RW1, right?), I don't understand
> > why an additional "barrier-ordered-after" would be desired.
>
> This was explained by the comment:
>
> // This was checked by using the herd7 consistency model simulator
> // (http://diy.inria.fr/) with this test case:
> //
> // AArch64 LseCas
> // { 0:X1=x; 0:X2=y; 1:X1=x; 1:X2=y; }
> // P0 | P1;
> // LDR W4, [X2] | MOV W3, #0;
> // DMB LD | MOV W4, #1;
> // LDR W3, [X1] | CASAL W3, W4, [X1];
> // | DMB ISH;
> // | STR W4, [X2];
> // exists
> // (0:X3=0 /\ 0:X4=1)
>
> Here, if you remove the trailing DMB the store to [x2] has become visible
> before the store to [x1].
>
> This happens because the CASAL followed by the STR has the memory effect of
>
> load [x1]; acquire; release; store [x1]; store [x2]
>
> And it's obvious that there is no ordering between the two stores. The
> fact that the operation on [x1] is atomic has no bearing on when the
> stores to [x1] and [x2] become visible.
>
> In order to prevent this reordering, we need StoreStore|StoreLoad after
> the CASAL.
>
> It's very instructive to run this test on the consistency model simulator.
> This interpretation was confirmed by the author of the Arm memory model.
>
> HOWEVER, the spec has been changed again, and it may well be that we no
> longer need the trailing barrier, because of the section you quoted:
>
> A memory read or write effect RW1 is Barrier-ordered-before a memory
> read or write effect RW2 from the same Observer if and only if RW1
> appears in program order before RW2 and any of the following cases
> apply:
>
> • RW1 is a memory write effect W1 and is generated by an atomic
> instruction with both Acquire and Release semantics.
>
> ... which looks like the spec has been tightened enough that we do not
> need the trailing barrier. Having said that, I think we still may need
> it for non-LSE, at least in theory.
>
More information about the hotspot-dev
mailing list