[jmm-dev] Jmm revision status

Thu Jul 17 22:57:20 UTC 2014

A few other updates:  Brian and I had a paper in MSPC 14 (
http://dl.acm.org/citation.cfm?id=2618134) that mostly summarizes the
out-of-thin-air issues and solutions based on prohibiting store->load
reordering.  I would argue that those are still the most practical
solutions we currently have.

One of my colleagues at Google points out that my earlier fear that bogus
branches needed to enforce load->store ordering would tie up branch
prediction resources should be unfounded.  It should be easy to arrange for
these branches to be statically predicted correctly, in which case it
appears that no prediction resources are used.

I think we still need real measurements of the cost, which, at least for
Java, I would expect to greatly depend on the cleverness of the compiler in
delaying branches and avoiding unnecessary ones.

Torvald Riegel and Paul McKenney are trying to turn C++11/C11
memory_order_consume into something useful, and have been running into some
of the same problems with definition of dependencies as we have here.

Although at most marginally relevant for Java, we also became aware of an
ARM erratum (
http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf,
perhaps discovered by some of the other participants here?), that seems to
effectively reduce the cost of prohibiting load->store reordering on ARMv7
for C++ memory_order_relaxed to zero.  Apparently a substantial fraction of
ARMv7 cores have a hardware erratum that requires a fence for
memory_order_relaxed loads anyway.  Otherwise loads from the same location
may be reordered, which is disallowed for C++ memory_order_relaxed, but
allowed for Java.  Thus any object code that is intended to correctly
support memory_order_relaxed on these processors should already prohibit
load->store reordering as a side-effect.  For C and C++, I expect that
realistically applies to all 32-bit ARM code.  Unfortunately, the required
workaround seems appreciably more expensive than what we would need to just
enforce load->store ordering, since it needs an actual fence.

As mentioned, this does not directly change the Java situation.  It also
does not affect 64-bit executables intended to run on ARMv8.

Hans

On Thu, Jul 17, 2014 at 1:11 PM, Peter Sewell <Peter.Sewell at cl.cam.ac.uk>
wrote:

> On 17 July 2014 21:00, Doug Lea <dl at cs.oswego.edu> wrote:
> >
> > Things were moving along rather nicely. And then ... nothing.
>
> We have been working in the meantime, but (unsurprisingly) it's proved
> tricky...  we have a draft paper at
>
> http://www.cl.cam.ac.uk/~pes20/cpp/c_concurrency_challenges.pdf
>
> that has some good news and bad news, mostly in the C context (work by
> Mark Batty, Kayvan Memarian, Kyndylan Nienhuis, Jean Pichon, and
> myself).  The former consist of a machine-checked proof of DRF-SC for
> the C/C++11 model and a machine-checked-equivalent operational C/C++11
> model.  For the latter, we describe our observations that there's no
> fix to the thin-air problems (preserving existing optimisations) in a
> per-candidate-execution style, and that this is not just a problem for
> relaxed atomics - the mix of nonatomic and (eg) SC atomic accesses
> that is allowed in C also leads to thin-air difficulties (this latter
> isn't necessarily relevant to Java, of course). Then we explore
> further the operational construction we sent a mail around earlier -
> it's interesting in that it copes with the thin-air examples (and in
> general with reordering) very nicely, but difficulties with other
> optimisations suggest it's not really a solution for C.   I guess also
> not for Java, but I don't have a good intuition for how much
> optimisation you might be prepared to give up.
>
> Meanwhile (we've just been talking at a couple of meetings), Viktor
> and Francesco et al. have identified yet other problems with C/C++11,
> especially related to whether source-to-source (or IR-to-IR)
> optimisations are sound w.r.t. it. Some of these are fixable, but
> others involve the same basic thin-air problem.
>
> Right now, I'm not aware of any fleshed out credible proposal for a
> decent model that allows something like relaxed atomics (implementable
> with just plain accesses) and current optimisations...    perhaps
> Alan's event-structure semantics will give us a way forward.
>
> > My sense is that people following things closely suddenly became less
> > optimistic that we will arrive at something simple and beautiful and
> > readily understandable after seeing Peter Sewell's proposed amendments
> > to C++/C11 (http://www.cl.cam.ac.uk/~pes20/cpp/notes44.html) and Alan
> > Jeffrey's unsimple follow-ups to his simple fresh start. See March
> > list archives at
> > http://mail.openjdk.java.net/pipermail/jmm-dev/2014-March/thread.html
> >
> > This seems to happen all the time with memory models. Practical
> > necessities surrounding processors and optimizers lead to messiness.
> > In particular, most people wanted to get rid of JSR133 user-hostile
> > "justification sequences" and the like, but this is now far from a
> > sure thing. (Peter's approach amounts to a special form of them.)
>
> Not really the same kind of thing, I think - that operational
> construction involves more than just a single candidate execution, but
> not the bait-and-switch nature of JSR133.
>
> best,
> Peter
>
>
> > All ideas would be welcome on how we can recover forward progress on
> > the core model. Especially since we do have some other updates
> > more-or-less conceptually ready to adapt to them, including actual
> > specs for "enhanced-volatile" acquire/release and other intrinsics,
> > and replacing the "final fields" specs with simpler constructor
> > release guarantees.
> >
> > -Doug
>