[concurrency-interest] RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Mon Dec 8 19:17:47 UTC 2014

On Mon, Dec 8, 2014 at 12:46 AM, David Holmes <davidcholmes at aapt.net.au> wrote:
> Martin,
>
> The paper you cite is about ARM and Power architectures - why do you think the lack of mention of x86/sparc implies those architectures are multiple-copy-atomic?

Reading some more in the same paper, I see:

"""Returning to the two properties above, in TSO a thread can see its
own writes before they become visible to other
threads (by reading them from its write buffer), but any write becomes
visible to all other threads simultaneously: TSO
is a multiple-copy atomic model, in the terminology of Collier
[Col92]. One can also see the possibility of reading
from the local write buffer as allowing a specific kind of local
reordering. A program that writes one location x then
reads another location y might execute by adding the write to x to the
thread’s buffer, then reading y from memory,
before finally making the write to x visible to other threads by
flushing it from the buffer. In this case the thread reads
the value of y that was in the memory before the new write of x hits memory."""

So (as you say) with TSO you don't have a total order of stores if you
read your own writes out of your own CPU's write buffer.  However, my
interpretation of "multiple-copy atomic" is that the initial
publishing thread can choose to use an instruction with sufficiently
strong memory barrier attached (e.g. LOCK;XXX on x86) to write to
memory so that the write buffer is flushed and then use plain relaxed
loads everywhere else to read those memory locations and this explains
the situation on x86 and sparc where volatile writes are expensive and
volatile reads are "free" and you get sequential consistency for Java
volatiles.

http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf