[jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS

Wed Jul 27 20:21:48 UTC 2016

I should have known better than to invite debate about C++ relaxed etc
specs here. Sorry. These details of how they are spec'ed in C++17 don't
seem to and should not matter to us. It's probably best to move that
discussion elsewhere.

To summarize though, C/C++ effectively has 6 modes: plain, relaxed,
consume, acquire/release, seq_cst, and linux pseudo-mode
*(volatile*)relaxed. Java (jdk9) effectively has only 4: plain,
opaque, acquire/release, volatile. Four appear to be enough -- modulo
other language differences, for every C++ construction, there is an
applicable Java construction that preserves essential properties
and is expected to be implemented in the same way in common
use cases. And to make this work out, we incorporate memory
system progress properties.

About which...

On 07/26/2016 03:26 PM, Hans Boehm wrote:

> [C++17]: "An implementation should ensure that the last value (in
> modification order) assigned by an atomic or synchronization operation will
> become visible to all other threads in a finite period of time."
>
> The reason I haven't pushed for something stronger is that I don't think
> hardware specifications consistently contain the corresponding guarantees,
> which would put language implementers in a weird position. But that could
> probably be argued either way.

Yes, just to clarify that other way: If a memory system not observing
progress guarantees were shipped, the designers would be blamed
for insufficiently specifying and testing its properties. To enable testing,
"eventually Predicate P" specs are normally phrased as: Every implementation
must pick and publish a K such that P always holds within K units (usually
clock cycles). The same holds in software, but without such convenient units,
sometimes leading to yet more arbitrary constants.

> I think the "no merge" rule is not really formally specifiable, since it's a
> compiler-only constraint that can't be tested by a conforming program.

Testing is not impossible but is less portable.
For a fun one, someone could try removing the volatile cast from the
linux READ_ONCE macro, build, and check for test suite bugs.

On 07/27/2016 01:55 PM, Hans Boehm wrote:
> Peter Dimov gave a good example in C++ discussions for wanting merging of
> atomic operations: Reference counting. If you see two reference count
> increments in a row, you clearly want to merge the underlying fetch_and_add
> operations.

Is allowing this worth the loss in ability to prevent it in
unwanted cases, e.g., when a huge but finite number of these were
otherwise postponed in a long-lived but bounded loop?
In other words, it is possible to write a combining-based ref-counter
if you need one, but not to write a non-combining one when you don't.
(And again, whatever the answer, Java vs C++ differences in such cases
are not too important.)

> I don't understand Doug's statement: "So implementations cannot be allowed to
> merge reads in ways that are sure to reduce the number of possible program
> traces."  We have hardware microarchitectures that do this on a grand scale
> by transactionally committing a bunch of memory operations in bulk

This is getting increasingly far afield, but within (most? all?) definitions
of transactions, multiple reads of the same variable are *required* to
take the same value (act as if merged), which is not exactly the same
as in any access mode. So something special would need to be said anyway
about what happens especially for atomics/volatiles.

(On the other hand, Concurrent code with multiple reads of the same
atomic/volatile variable within a method or transaction is already highly
suspicious and a-priori unlikely to be correct. So I agree with Hans that
most of the issues we are discussing cover cases that most programmers
should never encounter.)

-Doug