Semantics of VarHandle CAS methods

Hans Boehm hboehm at google.com
Thu Jun 30 17:54:00 UTC 2016


On Thu, Jun 30, 2016 at 7:59 AM, Martin Buchholz <martinrb at google.com>
wrote:
>
> On Thu, Jun 30, 2016 at 4:19 AM, Doug Lea <dl at cs.oswego.edu> wrote:
>
> > On 06/30/2016 06:38 AM, Martin Buchholz wrote:
> >
> >> It's not only about naming.
> >>
> >> So yes, I'd like the name weakCompareAndSet to be the sequentially
> >> consistent version, BUT I'd also expect the next more relaxed version
to
> >> be
> >> memory_order_acq_rel which we don't provide.
> >>
> >
> > There are a few flavors of C++ CAS/weakCAS that aren't explicitly
supported
> > (also including those with different memory orders on success and
failure),
> > because you can get the effects with combinations of the supplied
versions
> > and
> > explicit fences. Supporting all of them would have added lots of
> > seldom-used
> > methods.
> >
> >
> One concrete proposal that removes a method:
>
> Replace
> weakCompareAndSetAcquire and weakCompareAndSetRelease with
> weakCompareAndSetAcquireRelease

Using standard implementation recipes, the AcquireRelease version is
significantly
more expensive on Power and ARM, and there are many cases (e.g. implementing
lock acquisition or lock release) where you only need one.  I'd be hesitant
to do that.

>
>
> > I don't have a good intuition about how useful
non-sequentially-consistent
> >> CASes are.
> >>
> >
> > Among other uses, fallible performance counters, and the bases for
> > combining with fences as above.
>
>
> We can keep the fully relaxed flavor for use with fences, so you can still
> get the effect of  weakCompareAndSetAcquire if you really want.
Adding an acquireFence after a relaxed operation is similar, but not
identical, to the
acquire operation.  The acquire operation does not order, for example, a
load
before the CAS with respect to a load after the CAS.  The fence version
does.
Usually you want the acquire version.

On ARMv8, this will probably impact implementation cost, in that the fence
version actually has to generate a separate fence instruction.

>
> In C++ we have sequential consistency of the single memory location
holding
> an atomic (cache coherence).  Should we say something about locations
> updated via a VarHandle?  If you call weakCompareAndSetAcquire, then the
> spec says the write is a "plain write", but some kind of cache coherence
> seems inherent in the idea of a compareAndSet, so the write itself has to
> be complete and visible.

I completely agree that we need to be clear about cache coherence, and not
just for CAS.
I think a lot of code cares whether these operations are cache coherent or
not.

One could argue that the relaxed operations refer to ordinary Java memory
semantics
and are thus clearly not cache coherent, unlike memory_order_relaxed in C++.

But I think this also shows up with acquire release as well.  Using
acquire/release get/set
everywhere, can I get

Thread 1:
x = 1;
r1 = x (sees 2)

Thread 2:
x = 2;
r1 = x (sees 1)                             ?

I think this is entirely consistent with happens-before constraints.  But
it is rather mind-bending,
in that, no matter what the final value turns out to be after the threads
join, it will look to one
thread like the value of x changed twice.  This behavior is disallowed in
C++, even for
memory_order_relaxed.


More information about the core-libs-dev mailing list