[jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS

Wed Jul 27 17:55:44 UTC 2016

Peter Dimov gave a good example in C++ discussions for wanting merging of
atomic operations: Reference counting. If you see two reference count
increments in a row, you clearly want to merge the underlying fetch_and_add
operations.

(I say that in spite of the fact that I'm not a fan of explicit reference
counting, and am currently spending way too much time debugging
reference-counting code.  But it seems unavoidable at times, occasionally
even in Java, and pervasive in C++.)

I don't understand Doug's statement: "So implementations cannot be allowed
to merge reads in ways that are sure to reduce the number of possible
program traces."  We have hardware microarchitectures that do this on a
grand scale by transactionally committing a bunch of memory operations in
bulk (cf. http://dl.acm.org/citation.cfm?doid=1610252.1610271), so many
intermediate states are invisible. In general the rules are that we cannot
add traces, but removing possible traces is entirely fine.

My (failed) proposal to the C++ committee was to restrict software
transformations informally to be comparable to the hardware effects we
observe anyway.  I think that is the strongest property code that deals
only with conventional memory (not device registers) can reliably test for.

On Tue, Jul 26, 2016 at 1:03 PM, Doug Lea <dl at cs.oswego.edu> wrote:

>
> Moving ever further away from the alleged subject line...
>
>
> On 07/26/2016 01:09 PM, Paul E. McKenney wrote:
>
>> On Mon, Jul 25, 2016 at 03:24:48PM -0400, Doug Lea wrote:
>>
>>> (Gratuitously editorializing, one would think that in C++,
>>> it might also be popular to adopt this interpretation, and
>>> eliminate the need to ever integrate C "volatile", or to
>>> re-spec consume mode.)
>>>
>>
>> Yes and no.
>>
>> If I am working on a low-level synchronization primitive, then yes,
>> I really do want the system to do -exactly- what I tell it to, no more,
>> no less.
>>
>> But in higher-level code, I would likely be quite happy for the compiler
>> to fuse accesses, if it could do so without violating the memory model.
>>
>>
> The C++-relaxed spec definitely shows this tension. Sometimes people
> want it to mean just "plain, but don't tear words".  Which is not the
> same as what you'd otherwise spec as "the cheapest mode for a
> thread-safe variable respecting coherence". In Java,
> with the availability of "Plain" accesses even for volatiles,
> and access-atomicity for references and <=32bit scalars,
> there is little motivation to compromise for Opaque mode.
>
> In which case, the main premise is that when users use non-plain
> access modes for reads (similarly, but less interestingly writes), they
> are expressing that they intend to handle all of the possible program
> traces that might result if two subsequent reads see different values.
> So implementations cannot be allowed to merge reads in ways that are
> sure to reduce the number of possible program traces.
>
> Again, this is symmetric to the idea that implementations cannot be
> allowed to add writes (e.g., duplicate them) in ways that are sure to
> increase the number of possible program traces.
>
> It is surely possible to introduce a formalization of traces that
> rigorously states both constraints. But it is not easy to define an
> underlying trace model that covers practical execution issues. So in a
> language spec, it may be preferable to just say no merged reads and no
> added writes for atomics. Which is what C++ and Java both do now for
> no-added-writes. Or, it may be a better idea to leave the trace-based
> requirements incompletely formalized, which should have the same
> practical effect. Or even better (but not soon) agree upon some formalism.
>
> -Doug
>
>