RFC: linux-aarch64 and LSE support

Andrew Dinn adinn at redhat.com
Tue Oct 4 10:33:38 UTC 2022


On 03/10/2022 20:16, Kim Barrett wrote:
> My question is not whether using an acq-rel operation is sufficient to remove
> the need for a preceding fence, it's whether that's *necessary*? Specifically,
> is a release operation also sufficient, as discussed in the kernel patch?
> https://patchwork.kernel.org/patch/3575821/

Andrew is on PTO at the moment so I'll respond for now. He can (and will 
:-) correct me later if needed.

I don't believe any of Andrew's comments in this thread are questioning 
Will Deacon's statements regarding cases where the preceding fence can 
or cannot be dropped. I read them as all being about the need for a 
trailing fence.

The issue with AArch64 has always been that a releasing store has the 
potential, per the original spec, to be asymmetric in the way it orders 
visibility. Obviously, the spec guarantees visibility of all stores 
preceding the releasing store in program order before the releasing 
store itself becomes visible. It does not imply any visibility ordering 
guarantee for stores that follow the releasing store in program order. 
Hence the need for a DMB. This asymmetry is something that often 
surprises those coming to AArch64 from a TSO architecture like x86.

As Andrew mentioned, a recent spec change means that the situation is 
now different when the releasing store is also acquiring and is an 
atomic op. In that specific case, the spec change means that the op 
orders visibility of its store wrt both (program order) preceding and 
(program order) following stores.

> That kernel patch argues that for ll/sc atomics only a release operation is
> needed (except for cmpxchg).  And that still seems to hold for the linux
> kernel - it uses ldxr/stlxr with a trailing dmb ish.  Are you claiming
> otherwise?  (For some reason 8261027 used ldaxr rather than ldxr in the new .S
> file, without any justification for that change.)

I didn't read anything he said as claiming otherwise. I assume the use 
of ldaxr was unintended. My view is that it ought not to be needed.

> We also need to understand what forms can/should be used for LSE.  (Maybe a
> release operation with trailing fence works there too?  But let's not go
> there.)  I had much the same reaction as Dean about this, which your reply to
> him seems to agree with.  Specifically, we only need an acq-rel LSE to get the
> behavior we want from LSE atomics.

Yes, this seems to be what the spec now guarantees.

> So I think that for memory_order_conservative we want to use:
> 
> 1. For ll/sc cmpxchg: ldxr/stxr with leading and trailing dmb ish, e.g. a
> Relaxed operation with both leading and trailing fences.

Looks right to me.

> 2. For ll/sc non-cmpxchg: ldxr/stlxr with trailing dmb ish, e.g. a Release
> operation with a trailing fence.

Also looks right to me.

> 3. For LSE: an acq-rel instruction for all operations.

Also looks right to me.

> That looks like what the linux kernel is using.  It also mostly agrees with
> a recent change to gcc outline-atomics support:
> 
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=bc25483c055d62f94f8c289f80843dda3c4a6ff4
> 2022-05-13  Sebastian Pop  <spop at amazon.com>
> PR target/105162
> (included in gcc12.2)
> 
> Unfortunately, that change doesn't implement ll/sc cmpxchg the way we want
> (item 1), instead using ldxr/stlxr+dmb.  That's wrong, according to the
> afore-referenced linux kernel patch (an analysis I agree with). (There is also
> the problem of the __sync suite not having something for Atomic::xchg.)

Hmm, that is ... unfortunate :-/

> So how do we get the code we want?  There doesn't seem to be a way for us to
> use gcc intrinsics.  The "legacy" __sync_xxx operations are documented as
> "full barriers" as we want, and the above referenced change comes pretty
> close.  But we couldn't use that right now even if that change perfectly
> matched what we want, as it is far too new.  (After digging into some of this
> I have a great deal of sympathy for ErikO's position here:
> https://mail.openjdk.org/pipermail/hotspot-dev/2019-November/039931.html)

Yeah, Erik's point was always quite telling and it definitely seems to 
bite here.

> So I think we are (at least for now) stuck with rolling our own.  What we have
> has some problems, and I think can be improved in various ways (for example, I
> think it is possible to dispense with runtime stub generation).  I'm planning
> to offer some PRs along those lines.
Ok, thanks for the update.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill



More information about the hotspot-dev mailing list