RFR: 8224851: AArch64: fix warnings and errors with Clang and GCC 8.3

Thu Jun 13 08:36:02 UTC 2019

Hi Kim,

Thanks for the feedback. I'll make these changes.

> 
> Is there a reason to conditionalize use of __atomic_add_fetch for
> clang and continue to use __sync_add_and_fetch for gcc, rather than
> using __atomic_add_fetch unconditionally?

I think I did it that way because I didn't want to affect the build with 
gcc, but actually gcc compiles this to the same instructions with either 
variant so we can get rid of the #ifdef.

> 
> Is __ATOMIC_RELEASE and a following FULL_MEM_BARRIER the right usage
> to get the equivalent of __sync_add_and_fetch?  Or should
> __ATOMIC_SEQ_CST be used?  Or should the latter be actively avoided?
> I remember some discussion and questions in that area, about how to
> implement memory_order_seq_cst on ARM processors, but wasn't paying
> close attention since I don't deal with ARM much.
> 

It generates the same instruction sequence as the existing 
__sync_fetch_and_add.

The __sync_XXX builtins have a full barrier at the end [1], so GCC 
compiles this to:

.L2:
         ldxr    w2, [x1]
         add     w2, w2, w0
         stlxr   w3, w2, [x1]    ; Store release
         cbnz    w3, .L2
         dmb     ish             ; Full barrier
         mov     w0, w2
         ret

Whereas using the __atomic builtin with __ATOMIC_SEQ_CST we get two 
half-barriers like this:

.L2:
         ldaxr   w2, [x1]        ; Load acquire
         add     w2, w2, w0
         stlxr   w3, w2, [x1]    ; Store release
         cbnz    w3, .L2
         mov     w0, w2
         ret

Thanks,
Nick

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html