RFR: 8224851: AArch64: fix warnings and errors with Clang and GCC 8.3
Nick Gasson
nick.gasson at arm.com
Thu Jun 13 08:36:02 UTC 2019
Hi Kim,
Thanks for the feedback. I'll make these changes.
>
> Is there a reason to conditionalize use of __atomic_add_fetch for
> clang and continue to use __sync_add_and_fetch for gcc, rather than
> using __atomic_add_fetch unconditionally?
I think I did it that way because I didn't want to affect the build with
gcc, but actually gcc compiles this to the same instructions with either
variant so we can get rid of the #ifdef.
>
> Is __ATOMIC_RELEASE and a following FULL_MEM_BARRIER the right usage
> to get the equivalent of __sync_add_and_fetch? Or should
> __ATOMIC_SEQ_CST be used? Or should the latter be actively avoided?
> I remember some discussion and questions in that area, about how to
> implement memory_order_seq_cst on ARM processors, but wasn't paying
> close attention since I don't deal with ARM much.
>
It generates the same instruction sequence as the existing
__sync_fetch_and_add.
The __sync_XXX builtins have a full barrier at the end [1], so GCC
compiles this to:
.L2:
ldxr w2, [x1]
add w2, w2, w0
stlxr w3, w2, [x1] ; Store release
cbnz w3, .L2
dmb ish ; Full barrier
mov w0, w2
ret
Whereas using the __atomic builtin with __ATOMIC_SEQ_CST we get two
half-barriers like this:
.L2:
ldaxr w2, [x1] ; Load acquire
add w2, w2, w0
stlxr w3, w2, [x1] ; Store release
cbnz w3, .L2
mov w0, w2
ret
Thanks,
Nick
[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html
More information about the build-dev
mailing list