[jmm-dev] avoiding thin-air via an operational out-of-order semantics construction

Tue Apr 1 20:47:58 UTC 2014

It seems to me that there are much more common compiler analyses than
value-range analysis per se, that cause problems here.

I think that any sort of conventional alias or pointer analysis would also
be disallowed, as is, since it's based on reasoning that no thread can
perform certain actions.  That's not an issue until we get to references,
but we will clearly have to get there.

I think escape analysis has similar issues.  It allows the compiler to
prove that certain "shared" variable reads are in fact locally determined,
and thus not all edges in the local transition system need to be considered.

This does strike me as an improvement over prior attempts to avoid
over-ordering relaxed memory accesses, but I remain very worried about the
complexity we're going to end up with, especially if we also need a a
Power-like model of the memory system.

I'm perfectly happy to ignore progress issues, like the read-elimination
issue for now.  My conclusion from C++ experiences is that we need to
weasel-word progress guarantees anyway (we don't get much in the way of
absolute guarantees from the hardware, and if the hardware doesn't make
progress, Java doesn't either), and we can probably do that after the fact
in a way that's unlikely to be misunderstood.  Partial correctness seems
hard enough.

Hans

On Tue, Apr 1, 2014 at 6:55 AM, Peter Sewell <Peter.Sewell at cl.cam.ac.uk>wrote:

> On 31 March 2014 21:03, Paul E. McKenney <paulmck at linux.vnet.ibm.com>
> wrote:
> > On Mon, Mar 31, 2014 at 02:01:27PM +0100, Peter Sewell wrote:
> >> Dear all,
> >>
> >> building on the summary of the thin-air problem that we sent around
> >> earlier (http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html, Mark
> >> Batty+Peter Sewell),  we've sketched a semantics that gives the
> >> desired non-thin-air behaviour for at least some examples, for
> >> C11-style relaxed atomics / Linux-C ACCESS_ONCE accesses / Java
> >> volatiles.   At present it only covers reorderings, not eliminations
> >> and introductions, though we think it can probably be extended to
> >> those.
> >>
> >> The basic idea is to assume that compilers do not do inter-thread
> >> value range optimisations, using that to let us construct a
> >> thread-local out-of-order operational semantics out of a conventional
> >> thread-local (value-passing) labelled transition system. We then
> >> combine the resulting thread semantics with an operational
> >> non-multi-copy-atomic storage subsystem, as in our PLDI11 Power model,
> >> to be sound w.r.t. Power and ARM hardware.  The result is (we think)
> >> attractively simple to understand.   The big question is whether that
> >> assumption is reasonable (or can be made so),  and we'd welcome
> >> thoughts on that.
> >>
> >> The semantics is described here:
> >>
> >> http://www.cl.cam.ac.uk/~pes20/cpp/notes44.html  (Jean Pichon + Peter
> Sewell)
> >>
> >> (we've also got a formal version in Lem and a command-line prototype
> >> tool for exploring it).
> >>
> >> Comments would be very welcome
> >
> > I like the restriction that a given action can be hoisted ahead of
> > conditionals only if that action is performed in all cases, and if there
> > are no preceding fences or preceding actions to the same location as
> > the action in question.
>
> > I take it that the need to exclude value-range
> > analysis is also behind your discomfort with Linus Torvalds's and Torvald
> > Riegel's recent proposal for dependency ordering.
>
> I've not got that swapped in right now, but I think it suffered from
> different issues.  I'm perfectly fine with excluding value-range
> analysis so long
> as compiler folk can be persuaded to.
>
> > On status of variables, some ways of approximating the different sets:
> >
> > o       Auto variables whose address is never taken are local.
> >         (Yes, you could infer one auto variable's address from
> >         one of its counterparts, but when you do that you are
> >         taking your program's life into your hands anyway!)
> >
> > o       Variables marked __thread whose addresses are not taken
> >         are local.  The Linux kernel has a similar notion of
> >         per-CPU variables.
> >
> > o       Memory just allocated is private until an address is
> >         published somewhere.
> >
> > All that aside, I don't know of any way to tell the compiler that a
> > given location that was previously public has now been privatized
> > (such privatization is common in RCU code).
> >
> > On memory action elimination and introduction, when you say (for example)
> > read-after-write, how long after the write?
> > I am guessing that if
> > there is a fence or a RMW atomic between the read and the write, the
> > read cannot be eliminated.
>
> We'd have to think (in the little language that Jean and I have worked
> out some details, there are no RMWs).
>
> > One simples example that would be broken by
> > overly energetic read-after-write elimination is the queued spinlock,
> > which initialized its queue element, then spins waiting for the lock
> > holder to change the value.
>
> One might perhaps want to distinguish between eliminating finitely
> many occurrences of a read and eliminating all of them.
>
> >  (But you could reasonably argue that such
> > uses should have volatile semantics.)
> >
> > On non-atomics, you lost me on the "if there is a data race on
> non-atomics
> > performed out-of-order, then there is one with non-atomics performed
> > in order".  This might be because I am assuming that non-out-of-order
> > non-atomics might benefit from hardware control dependencies, and perhaps
> > you are excluding those.
>
> I think so - our semantics doesn't explicitly know about h/w control
> dependency preservation.
>
> thanks,
> peter
>
>
> >  For C/C++, data races on non-atomic accesses
> > result in undefined behavior, which can certainly include out-of-thin-air
> > values, right?
> >
> >                                                         Thanx, Paul
> >
>