[jmm-dev] Store completion query - general and ARM
David Holmes
david.holmes at oracle.com
Thu Nov 10 20:49:45 UTC 2016
On 10/11/2016 7:54 PM, Peter Sewell wrote:
> If you want a more rigorous and concrete model that explains this, you
> might want to look at:
> http://www.cl.cam.ac.uk/~pes20/popl16-armv8/top.pdf
Thanks Peter, I will take a look at this.
David
> The associated tool: www.cl.cam.ac.uk/~pes20/AArch64/
> <http://www.cl.cam.ac.uk/~pes20/AArch64/>
> lets one run arbitrary model-allowed executions of tests:
> - select AArch64 test Tutorial/SB+dmb.sys
> - under Options, select "a larger set of transitions which we proved can
> be taken eagerly"
> - click Run, then it shows the initial state of the model with the
> initial transitions highlighted in green
> - take all the thread-local transitions of each thread (five each)
> - now you can see each thread's write, dmb, and read request in the
> "flowing model" abstract storage subsystem
> - in this model, the dmb sys keeps the write and read request in order
> as they flow down to memory, so no interleaving of the possible model
> transitions can break the Dekker's algorithm property.
>
> In this example, it happens that the read requests also can't be
> satisfied from writes that haven't hit main memory, but in general they
> can be satisfied earlier.
>
> For contrast, if you try the SB test without dmb, you'll many more
> possible executions.
>
> This flowing model is actually a bit more microarchitectural than one
> would like for an architectural spec, as it exposes the abstract
> interconnect topology. The POP model, also provided by that tool,
> abstracts from the topology. Both are principally due to Shaked Flur, cc'd.
>
> best,
> Peter
>
>
> On 10 November 2016 at 09:31, Andrew Haley <aph at redhat.com
> <mailto:aph at redhat.com>> wrote:
>
> On 10/11/16 09:20, David Holmes wrote:
> > On 10/11/2016 7:07 PM, Andrew Haley wrote:
> >> On 10/11/16 00:06, David Holmes wrote:
> >>> Does any part of the JMM require actual visibility/completion of
> >>> volatile stores or is it only order that is defined (with an assumptions
> >>> that all stores will complete in a finite time)?
> >>
> >> Ordering is really all that we've got: all that memory fences can do
> >> is ensure that visibility of loads and stores is ordered in some way.
> >
> > If we establish some global order of loads and stores, yes. That can in
> > turn require that a store become visible prior to a given load.
>
> I agree.
>
> >>> In relation to ARM specifically, Dekker style algorithms require
> >>> visibility/completion of the store before the subsequent load, yet the
> >>> example in "A Tutorial Introduction to the ARM and POWER Relaxed Memory
> >>> Models" shows the use of DMB, not DSB.
> >>
> >> DMB is fine for that. Dekker doesn't need a store to be forced out of
> >> the caches, only that the store be made visible to other processors
> >> before any operations later in program order.
> >
> > Again it is far from obvious to me that DMB causes the store to be
> > visible before any operations later in program order. I find the Group A
> > / Group B formulation (and even the definition of "observe") to be quite
> > obscure and hard to map to actual code behaviour.
>
> Indeed. The real problem is that ARM are trying to describe the
> memory model in an abstract way that does not overly constrain
> implementations. But a DMB really is sufficient to ensure that prior
> stores are visible. (Mind you, we don't need DMB to get sequentially-
> consistent behaviour that's enough for Java volatiles.)
>
> >>> Yet AFAICS DMB says nothing about completion whereas DSB does. ??
> >>> (To be honest I find the Group A/B description of DMB properties
> >>> extremely hard to actually interpret wrt code like Dekker.)
> >>
> >> DSB is only really needed if there are multiple caches of the same
> >> address, i.e. Icache and Dcache: it's necessary to force a store out
> >> into main memory in order to refresh he Icache.
> >
> > I thought only ISB had an effect relative to instructions/i-cache ??
>
> It does: you need DSB to ensure the visibility of the data cleaned
> from the Dcache, then ISB to synchronize the fetched instruction
> stream.
>
> Andrew.
>
>
>
More information about the jmm-dev
mailing list