Single byte Atomic::cmpxchg implementation

David Holmes david.holmes at
Thu Sep 11 13:14:58 UTC 2014

On 11/09/2014 9:30 PM, Erik Österlund wrote:
> On 11 Sep 2014, at 03:25, David Holmes <david.holmes at> wrote:
>> The Atomic operations must provide full bi-directional fence semantics, so a full sync on entry is required in my opinion. I agree that the combination of bne+isync would suffice on the exit path.
> I see no reason for the atomic operations to support more than full acquire and release (hence sequential consistency) memory behaviour as well as atomic updates.

If the atomic operation were itself indivisible then the suggested 
barriers pre- and post would be correct. But when the atomic operation 
is itself a sequence of instructions you also have to guard against 
reordering relative to the variable being atomically updated. So the 
sync is needed to provide a full two-way barrier between the code 
preceding the atomic op and the code within the atomic op.

There was a very long discussion on this aspect of the atomic operations 
not that long ago.


> For this, I see no reason why a full sync rather than lwsync is required (for the write barrier). The XNU kernel implementation also uses lwsync for release semantics and isync for the acquire.
> Why would this be different for us? From the XNU kernel (note the choice of fences I argue for):
> compare_and_swap32_on64b:			// bool OSAtomicCompareAndSwapBarrier32( int32_t old, int32_t new, int32_t *value);
>          lwsync                      // write barrier, NOP'd on a UP
> 1:
> 		lwarx   r7,0,r5
> 		cmplw   r7,r3
> 		bne--	2f
> 		stwcx.  r4,0,r5
> 		bne--	1b
>          isync                       // read barrier, NOP'd on a UP
> 		li		r3,1
> 		blr
> 2:
> 		li		r8,-8				// on 970, must release reservation
> 		li		r3,0				// return failure
> 		stwcx.  r4,r8,r1			// store into red zone to release
> 		blr
>> But this is a complex area, involving hardware that doesn't always follow the rules, so conservatism is understandable.
> As far as wrong hardware goes, I don't know what to do about that but can we confirm that there is hardware not doing the fences according to specification in particular?
> It becomes very difficult to respect incorrect hardware implementations in my opinion.
>> But this needs to be taken up with the PPC64 folk who did this port.
> I agree, it would be very helpful to hear the perspective of the ones who wrote our implementation.
> /Erik

More information about the hotspot-dev mailing list