[jmm-dev] Store completion query - general and ARM

David Holmes david.holmes at oracle.com
Thu Nov 10 20:49:45 UTC 2016

On 10/11/2016 7:54 PM, Peter Sewell wrote:
> If you want a more rigorous and concrete model that explains this, you
> might want to look at:
> http://www.cl.cam.ac.uk/~pes20/popl16-armv8/top.pdf

Thanks Peter, I will take a look at this.


> The associated tool: www.cl.cam.ac.uk/~pes20/AArch64/
> <http://www.cl.cam.ac.uk/~pes20/AArch64/>
> lets one run arbitrary model-allowed executions of tests:
> - select AArch64 test Tutorial/SB+dmb.sys
> - under Options, select "a larger set of transitions which we proved can
> be taken eagerly"
> - click Run, then it shows the initial state of the model with the
> initial transitions highlighted in green
> - take all the thread-local transitions of each thread (five each)
> - now you can see each thread's write, dmb, and read request in the
> "flowing model" abstract storage subsystem
> - in this model, the dmb sys keeps the write and read request in order
> as they flow down to memory, so no interleaving of the possible model
> transitions can break the Dekker's algorithm property.
> In this example, it happens that the read requests also can't be
> satisfied from writes that haven't hit main memory, but in general they
> can be satisfied earlier.
> For contrast, if you try the SB test without dmb, you'll many more
> possible executions.
> This flowing model is actually a bit more microarchitectural than one
> would like for an architectural spec, as it exposes the abstract
> interconnect topology.  The POP model, also provided by that tool,
> abstracts from the topology.  Both are principally due to Shaked Flur, cc'd.
> best,
> Peter
> On 10 November 2016 at 09:31, Andrew Haley <aph at redhat.com
> <mailto:aph at redhat.com>> wrote:
>     On 10/11/16 09:20, David Holmes wrote:
>     > On 10/11/2016 7:07 PM, Andrew Haley wrote:
>     >> On 10/11/16 00:06, David Holmes wrote:
>     >>> Does any part of the JMM require actual visibility/completion of
>     >>> volatile stores or is it only order that is defined (with an assumptions
>     >>> that all stores will complete in a finite time)?
>     >>
>     >> Ordering is really all that we've got: all that memory fences can do
>     >> is ensure that visibility of loads and stores is ordered in some way.
>     >
>     > If we establish some global order of loads and stores, yes. That can in
>     > turn require that a store become visible prior to a given load.
>     I agree.
>     >>> In relation to ARM specifically, Dekker style algorithms require
>     >>> visibility/completion of the store before the subsequent load, yet the
>     >>> example in "A Tutorial Introduction to the ARM and POWER Relaxed Memory
>     >>> Models" shows the use of DMB, not DSB.
>     >>
>     >> DMB is fine for that.  Dekker doesn't need a store to be forced out of
>     >> the caches, only that the store be made visible to other processors
>     >> before any operations later in program order.
>     >
>     > Again it is far from obvious to me that DMB causes the store to be
>     > visible before any operations later in program order. I find the Group A
>     > / Group B formulation (and even the definition of "observe") to be quite
>     > obscure and hard to map to actual code behaviour.
>     Indeed.  The real problem is that ARM are trying to describe the
>     memory model in an abstract way that does not overly constrain
>     implementations.  But a DMB really is sufficient to ensure that prior
>     stores are visible.  (Mind you, we don't need DMB to get sequentially-
>     consistent behaviour that's enough for Java volatiles.)
>     >>> Yet AFAICS DMB says nothing about completion whereas DSB does. ??
>     >>> (To be honest I find the Group A/B description of DMB properties
>     >>> extremely hard to actually interpret wrt code like Dekker.)
>     >>
>     >> DSB is only really needed if there are multiple caches of the same
>     >> address, i.e. Icache and Dcache: it's necessary to force a store out
>     >> into main memory in order to refresh he Icache.
>     >
>     > I thought only ISB had an effect relative to instructions/i-cache ??
>     It does: you need DSB to ensure the visibility of the data cleaned
>     from the Dcache, then ISB to synchronize the fetched instruction
>     stream.
>     Andrew.

More information about the jmm-dev mailing list