RFC: linux-aarch64 and LSE support
Andrew Haley
aph-open at littlepinkcloud.com
Tue Oct 4 12:24:21 UTC 2022
Just a couple of things...
On 10/4/22 11:33, Andrew Dinn wrote:
> On 03/10/2022 20:16, Kim Barrett wrote:
...
>> So I think that for memory_order_conservative we want to use:
>>
>> 1. For ll/sc cmpxchg: ldxr/stxr with leading and trailing dmb ish, e.g. a
>> Relaxed operation with both leading and trailing fences.
>
> Looks right to me.
>
>> 2. For ll/sc non-cmpxchg: ldxr/stlxr with trailing dmb ish, e.g. a Release
>> operation with a trailing fence.
>
> Also looks right to me.
>
>> 3. For LSE: an acq-rel instruction for all operations.
>
> Also looks right to me.
>
>> That looks like what the linux kernel is using. It also mostly agrees with
>> a recent change to gcc outline-atomics support:
>>
>> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=bc25483c055d62f94f8c289f80843dda3c4a6ff4
>> 2022-05-13 Sebastian Pop <spop at amazon.com>
>> PR target/105162
>> (included in gcc12.2)
>>
>> Unfortunately, that change doesn't implement ll/sc cmpxchg the way we want
>> (item 1), instead using ldxr/stlxr+dmb. That's wrong,
It's not wrong, but it is different: GCC's CAS atomics have never
guaranteed ordering if the CAS fails. In other words, it's not possible
to synchronize with a store that did not take place.
>> according to the
>> afore-referenced linux kernel patch (an analysis I agree with). (There is also
>> the problem of the __sync suite not having something for Atomic::xchg.)
Huh? __atomic_exchange_n() .
>> So how do we get the code we want? There doesn't seem to be a way for us to
>> use gcc intrinsics.
One way to solve this is, I suspect, to realize that ldxr/stxr is only
used when LSE is not available, and all contemporary AArch64
implementations support LSE. Therefore, ldxr/stxr is legacy only, and
it barely matters if it's somewhat suboptimal.
So, maybe all we have to do is use GCC's operations and throw in a DMB
ISH after __atomic ops, when needed. To do that we could use outline
atomics and a
if (!LSE) __sync_synchronize();
or with a function pointer
(*maybe_dmb)();
in the case of memory_order_conservative. That should add a single well-
predicted branch. We could benchmark that and see if it'll do.
We could also do something like this:
if (CAS failed) __sync_synchronize();
in the non_LSE case.
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the hotspot-dev
mailing list