RFC: linux-aarch64 and LSE support

Mon Oct 3 19:16:31 UTC 2022

> On Sep 20, 2022, at 7:15 AM, Andrew Haley <aph-open at littlepinkcloud.com> wrote:
> 
> On 9/20/22 11:32, Kim Barrett wrote:
> > There is a big comment in front of the new stub generation code, talking about
> > how a acq-rel operation doesn't need a preceeding fence when using LSE
> > atomics.  I can see how that's very useful for cmpxchg.  (And the comment
> > mostly discusses cmpxchg.)  But I'm not certain of it's relevance for other
> > operations.
> 
> It's the same for all ops. None of them need a preceding fence.

My question is not whether using an acq-rel operation is sufficient to remove
the need for a preceding fence, it's whether that's *necessary*? Specifically,
is a release operation also sufficient, as discussed in the kernel patch?
https://patchwork.kernel.org/patch/3575821/

That kernel patch argues that for ll/sc atomics only a release operation is
needed (except for cmpxchg).  And that still seems to hold for the linux
kernel - it uses ldxr/stlxr with a trailing dmb ish.  Are you claiming
otherwise?  (For some reason 8261027 used ldaxr rather than ldxr in the new .S
file, without any justification for that change.)

We also need to understand what forms can/should be used for LSE.  (Maybe a
release operation with trailing fence works there too?  But let's not go
there.)  I had much the same reaction as Dean about this, which your reply to
him seems to agree with.  Specifically, we only need an acq-rel LSE to get the
behavior we want from LSE atomics.

So I think that for memory_order_conservative we want to use:

1. For ll/sc cmpxchg: ldxr/stxr with leading and trailing dmb ish, e.g. a
Relaxed operation with both leading and trailing fences.

2. For ll/sc non-cmpxchg: ldxr/stlxr with trailing dmb ish, e.g. a Release
operation with a trailing fence.

3. For LSE: an acq-rel instruction for all operations.

That looks like what the linux kernel is using.  It also mostly agrees with
a recent change to gcc outline-atomics support:

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=bc25483c055d62f94f8c289f80843dda3c4a6ff4
2022-05-13  Sebastian Pop  <spop at amazon.com>
PR target/105162
(included in gcc12.2)

Unfortunately, that change doesn't implement ll/sc cmpxchg the way we want
(item 1), instead using ldxr/stlxr+dmb.  That's wrong, according to the
afore-referenced linux kernel patch (an analysis I agree with). (There is also
the problem of the __sync suite not having something for Atomic::xchg.)

So how do we get the code we want?  There doesn't seem to be a way for us to
use gcc intrinsics.  The "legacy" __sync_xxx operations are documented as
"full barriers" as we want, and the above referenced change comes pretty
close.  But we couldn't use that right now even if that change perfectly
matched what we want, as it is far too new.  (After digging into some of this
I have a great deal of sympathy for ErikO's position here:
https://mail.openjdk.org/pipermail/hotspot-dev/2019-November/039931.html)

So I think we are (at least for now) stuck with rolling our own.  What we have
has some problems, and I think can be improved in various ways (for example, I
think it is possible to dispense with runtime stub generation).  I'm planning
to offer some PRs along those lines.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20221003/e89272f0/signature.asc>