[jmm-dev] avoiding thin-air via an operational out-of-order semantics construction

Tue Apr 1 23:21:02 UTC 2014

On 1 April 2014 21:47, Hans Boehm <boehm at acm.org> wrote:
> It seems to me that there are much more common compiler analyses than
> value-range analysis per se, that cause problems here.
>
> I think that any sort of conventional alias or pointer analysis would also
> be disallowed, as is, since it's based on reasoning that no thread can
> perform certain actions.  That's not an issue until we get to references,
> but we will clearly have to get there.

That's a good point, though there are some partial mitigations:

a) we only need this for shared accesses, so whenever the compiler can
persuade itself that a location is thread-local, any value restriction
becomes legitimate

b) it would be easy to restrict to type-correct values (modulo exactly
what C one is aiming at - whether you want to enable type-based alias
analysis or not, for example)

c) as Viktor points out in other email, we could add a dynamic
value-range-restriction transition and aborting if that gets violated
(or adding a proof obligation).  That's not so nice for an executable
semantics, but not hard to deal with semantically.

Whether those take us to a viable proposal I don't know.

> I think escape analysis has similar issues.  It allows the compiler to prove
> that certain "shared" variable reads are in fact locally determined, and
> thus not all edges in the local transition system need to be considered.

Semantically we have to figure out the shared accesses in any case.

> This does strike me as an improvement over prior attempts to avoid
> over-ordering relaxed memory accesses, but I remain very worried about the
> complexity we're going to end up with, especially if we also need a a
> Power-like model of the memory system.

The latter is (I'd say) the "simple" part of the Power model - any
non-multi-copy-atomic operational semantics will have to have
something a bit like it.  Overall complexity is hard to assess without
coping with all the C11 memory orders and RMWs, but, if this does get
us a satisfactory non-thin-air definition,I don't see any reason to
think it'd be unmanageable.

> I'm perfectly happy to ignore progress issues, like the read-elimination
> issue for now.  My conclusion from C++ experiences is that we need to
> weasel-word progress guarantees anyway (we don't get much in the way of
> absolute guarantees from the hardware, and if the hardware doesn't make
> progress, Java doesn't either), and we can probably do that after the fact
> in a way that's unlikely to be misunderstood.  Partial correctness seems
> hard enough.

y, indeed

thanks,
Peter

> Hans
>
>
> On Tue, Apr 1, 2014 at 6:55 AM, Peter Sewell <Peter.Sewell at cl.cam.ac.uk>
> wrote:
>>
>> On 31 March 2014 21:03, Paul E. McKenney <paulmck at linux.vnet.ibm.com>
>> wrote:
>> > On Mon, Mar 31, 2014 at 02:01:27PM +0100, Peter Sewell wrote:
>> >> Dear all,
>> >>
>> >> building on the summary of the thin-air problem that we sent around
>> >> earlier (http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html, Mark
>> >> Batty+Peter Sewell),  we've sketched a semantics that gives the
>> >> desired non-thin-air behaviour for at least some examples, for
>> >> C11-style relaxed atomics / Linux-C ACCESS_ONCE accesses / Java
>> >> volatiles.   At present it only covers reorderings, not eliminations
>> >> and introductions, though we think it can probably be extended to
>> >> those.
>> >>
>> >> The basic idea is to assume that compilers do not do inter-thread
>> >> value range optimisations, using that to let us construct a
>> >> thread-local out-of-order operational semantics out of a conventional
>> >> thread-local (value-passing) labelled transition system. We then
>> >> combine the resulting thread semantics with an operational
>> >> non-multi-copy-atomic storage subsystem, as in our PLDI11 Power model,
>> >> to be sound w.r.t. Power and ARM hardware.  The result is (we think)
>> >> attractively simple to understand.   The big question is whether that
>> >> assumption is reasonable (or can be made so),  and we'd welcome
>> >> thoughts on that.
>> >>
>> >> The semantics is described here:
>> >>
>> >> http://www.cl.cam.ac.uk/~pes20/cpp/notes44.html  (Jean Pichon + Peter
>> >> Sewell)
>> >>
>> >> (we've also got a formal version in Lem and a command-line prototype
>> >> tool for exploring it).
>> >>
>> >> Comments would be very welcome
>> >
>> > I like the restriction that a given action can be hoisted ahead of
>> > conditionals only if that action is performed in all cases, and if there
>> > are no preceding fences or preceding actions to the same location as
>> > the action in question.
>>
>> > I take it that the need to exclude value-range
>> > analysis is also behind your discomfort with Linus Torvalds's and
>> > Torvald
>> > Riegel's recent proposal for dependency ordering.
>>
>> I've not got that swapped in right now, but I think it suffered from
>> different issues.  I'm perfectly fine with excluding value-range
>> analysis so long
>> as compiler folk can be persuaded to.
>>
>> > On status of variables, some ways of approximating the different sets:
>> >
>> > o       Auto variables whose address is never taken are local.
>> >         (Yes, you could infer one auto variable's address from
>> >         one of its counterparts, but when you do that you are
>> >         taking your program's life into your hands anyway!)
>> >
>> > o       Variables marked __thread whose addresses are not taken
>> >         are local.  The Linux kernel has a similar notion of
>> >         per-CPU variables.
>> >
>> > o       Memory just allocated is private until an address is
>> >         published somewhere.
>> >
>> > All that aside, I don't know of any way to tell the compiler that a
>> > given location that was previously public has now been privatized
>> > (such privatization is common in RCU code).
>> >
>> > On memory action elimination and introduction, when you say (for
>> > example)
>> > read-after-write, how long after the write?
>> > I am guessing that if
>> > there is a fence or a RMW atomic between the read and the write, the
>> > read cannot be eliminated.
>>
>> We'd have to think (in the little language that Jean and I have worked
>> out some details, there are no RMWs).
>>
>> > One simples example that would be broken by
>> > overly energetic read-after-write elimination is the queued spinlock,
>> > which initialized its queue element, then spins waiting for the lock
>> > holder to change the value.
>>
>> One might perhaps want to distinguish between eliminating finitely
>> many occurrences of a read and eliminating all of them.
>>
>> >  (But you could reasonably argue that such
>> > uses should have volatile semantics.)
>> >
>> > On non-atomics, you lost me on the "if there is a data race on
>> > non-atomics
>> > performed out-of-order, then there is one with non-atomics performed
>> > in order".  This might be because I am assuming that non-out-of-order
>> > non-atomics might benefit from hardware control dependencies, and
>> > perhaps
>> > you are excluding those.
>>
>> I think so - our semantics doesn't explicitly know about h/w control
>> dependency preservation.
>>
>> thanks,
>> peter
>>
>>
>> >  For C/C++, data races on non-atomic accesses
>> > result in undefined behavior, which can certainly include
>> > out-of-thin-air
>> > values, right?
>> >
>> >                                                         Thanx, Paul
>> >
>
>