[aarch64-port-dev ] barrier issue in cmpxchgptr implementation
Tangwei (Euler)
tangwei6 at huawei.com
Tue Jun 30 14:34:12 UTC 2015
Hi All,
I checked the MacroAssembler::cmpxchgptr implementation in JVM, and found load-aquire/store-release followed by a full memory barrier used to simulate the behavior of X86 cmpxchg.
Following is the code snapshot from the function for your reference. In my opinion, the cmpxchg must provide full bi-directional fence semantics. Is there anyone can help to explain why one
membar in the end is enough for cmpxchg? Or one more memory barrier commented out is needed to add at the beginning?
// membar(AnyAny) // is this needed?
retry_load:
ldaxr(tmp, addr);
cmp(tmp, oldv);
br(Assembler::NE, nope);
stlxr(tmp, newv, addr);
cbzw(tmp, succeed);
b(retry_load);
nope:
membar(AnyAny);
mov(oldv, tmp);
I checked the source __cmpxchg_mb, it seems two memory barrier is added around cmpxchg.
smp_mb();
ret = __cmpxchg(ptr, old, new, size);
smp_mb();
But after checked intrinsic in GCC with following simple case, I found there is no barrier around load-acquire/store-release.
I am a little confused now. Which instruction sequence should be chosen to simulate X86 cmpxchg?
long foo (long *ptr, long old, long new)
{
return __sync_val_compare_and_swap (ptr, old, new);
}
Assembly:
ldaxr x3, [x0] // 22 aarch64_load_exclusivedi [length = 4]
cmp x3, x1 // 23 *cmpdi/1 [length = 4]
bne .L3 // 24 *condjump [length = 4]
stlxr w4, x2, [x0] // 25 aarch64_store_exclusivedi [length = 4]
cbnz w4, .L2 // 26 *cbnesi1 [length = 4]
Regards!
wei
More information about the aarch64-port-dev
mailing list