Single byte Atomic::cmpxchg implementation

Fri Sep 12 02:07:18 UTC 2014

On 12/09/2014 4:43 AM, Erik Österlund wrote:
>
>> On 11 sep 2014, at 14:15, "David Holmes" <david.holmes at oracle.com> wrote:
>>
>>> On 11/09/2014 9:30 PM, Erik Österlund wrote:
>>>> On 11 Sep 2014, at 03:25, David Holmes <david.holmes at oracle.com> wrote:
>>>> The Atomic operations must provide full bi-directional fence semantics, so a full sync on entry is required in my opinion. I agree that the combination of bne+isync would suffice on the exit path.
>>>
>>> I see no reason for the atomic operations to support more than full acquire and release (hence sequential consistency) memory behaviour as well as atomic updates.
>>
>> If the atomic operation were itself indivisible then the suggested barriers pre- and post would be correct. But when the atomic operation is itself a sequence of instructions you also have to guard against reordering relative to the variable being atomically updated. So the sync is needed to provide a full two-way barrier between the code preceding the atomic op and the code within the atomic op.
>
> I see. AFAIK lwsync orders everything except StoreLoad. So I deduce the only potential hazard of replacing the write barrier with lwsync would be that the load link could be speculatively loaded like normal loads and lead to false negative CAS? I didn't think lwarx could be speculatively loaded, but I see the point now if that is the case. (Note that false positives are still impossible because a reorded load link would fail to store conditional when attempting to commit, and the store conditional will not float above the lwsync)
>
> If this is indeed the case, they should still be isync instead of sync right?

isync? You mean lwsync?

I agree that missing storeload prior to the load-linked should not be a 
problem. But I'm unclear if all the Power architectures define lwsync 
exactly the same way (I have a Freescale reference which does, but I 
don't know if IBM Power is the same.) I defer to the PPC64 folk to have 
selected what seems the most generally appropriate form.

> Also, what about allowing programmers to use weak CAS like in more advanced atomics APIs? For most lock-free algorithms weak CAS is good enough since there is a retry loop anyway. And it would get rid of the awkward retry loop required for the case of context switching between LL and SC.

Allowing in what context? If such a need arose in the VM then we would 
certainly look at implementing whatever was necessary.

> But then it's a larger change suddenly which maybe isn't worth the trouble? :)

Indeed. The general correctness and performance concerns make changes in 
this area difficult.

David

> /Erik
>