RFC: linux-aarch64 and LSE support

Tue Sep 20 10:32:47 UTC 2022

> On Sep 15, 2022, at 4:01 AM, Andrew Haley <aph-open at littlepinkcloud.com> wrote:
> 
> On 9/15/22 01:16, Kim Barrett wrote:
>> I*think*  it can be done as a sequence of commits to revert prior commits,
>> followed by a couple of newly developed commits to reinstate features like
>> support for relaxed atomics.
> 
> I expect so.

The reversions are indeed relatively straightforward.  There is one set of
conflicts for one of the commits, due to unrelated nearby changes in one file,
but that's relatively easy to deal with.

But...

Digging into this deeper, I've run across some problems.  Of course, it's
about memory_order_conservative.

I know about the discussion here:
https://mail.openjdk.org/pipermail/hotspot-dev/2019-November/039912.html
(long discussion about implementing memory_order_conservative)

which refers to this:
https://patchwork.kernel.org/patch/3575821/
This is the reference leading to the pre-JDK-8261027 implementation.

To summarize, for memory_order_conservative with ll/sc-style atomics
- For cmpxchg, use <fence> <relaxed-cmpxchg> <fence>
- For other ops, use <release-op> <fence>
That's what we were requesting before JDK-8261027, via __atomic intrinsics.
But -moutline-atomics changed things so we were potentially using the
corresponding LSE atomic instead of the ll/sc-style.

For now I'm only going to discuss the other ops, not cmpxchg.  (cmpxchg is
it's own ball of ugliness, but I think I know how to deal with it.)

8261027 introduced support for using LSE atomics.  There are some surprises:

(A) The default (ll/sc) implementations (atomic_linux_aarch64.S) are all
"acq-rel" rather than "release".

(B) The generated LSE stubs for "add" variants are "acq-rel" rather than
"release".  Meanwhile, the generated stubs for "xchg" variants are "release".

(A) seems like a mistake.  The inconsistency in (B) also seems like a
mistake. The question of which one is correct takes us to the next change.

8261649 is intended to optimize the use of LSE atomics, though the bug only
talks about cmpxchg.  (It also fixes a problem with the timing of stub
generation vs possible use, but that doesn't matter for this discussion.)

There is a big comment in front of the new stub generation code, talking about
how a acq-rel operation doesn't need a preceeding fence when using LSE
atomics.  I can see how that's very useful for cmpxchg.  (And the comment
mostly discusses cmpxchg.)  But I'm not certain of it's relevance for other
operations.

Can non-cmpxchg operations still be implemented as <release-op> <fence> when
(potentially) using LSE instructions?  It seems like an argument similar to
the one for ll/sc could be made.  If so, then we can use __atomic (and
-moutline-atomics) to easily implement them.  If not, that's a somewhat
unpleasant semantic change going from ll/sc to LSE.  (And probably breaks the
current bsd_aarch64 implementation.)  It also gives us a couple of choices for
implementation:

(C) Use <acq-rel-op> <fence> with the op generated using __atomic (and using
-moutline-atomics), in case we end up using LSE.  This accepts the unneeded
acquire if using ll/sc as a necessary evil for simplicity of implementation.
(It also matches the current code, though generated differently.)

(D) Avoid -moutline-atomics, doing our own use-LSE dispatch to either ll/sc or
LSE implementations.  There are several ways to do this, including the
approach taken by the current code.

So do we really need <acq-rel-op> <fence> when using LSE istructions?  Or can
we continue to use <release-op> <fence>?

I think there are also some problems with the big block comment for the stub
generation.  For example, I can't make sense of the model-based test
description.  For one thing, it talks about the resulting X3 and X4, but those
don't appear in the test code.  (I'm also not getting the point about a lack
of barrier-ordered-after, but maybe I'm just confused.)