[jmm-dev] Jmm revision status

Fri Jul 18 05:43:06 UTC 2014

On 18 July 2014 00:57, Hans Boehm <boehm at acm.org> wrote:
> A few other updates:  Brian and I had a paper in MSPC 14
> (http://dl.acm.org/citation.cfm?id=2618134) that mostly summarizes the
> out-of-thin-air issues and solutions based on prohibiting store->load
> reordering.  I would argue that those are still the most practical solutions
> we currently have.
>
> One of my colleagues at Google points out that my earlier fear that bogus
> branches needed to enforce load->store ordering would tie up branch
> prediction resources should be unfounded.  It should be easy to arrange for
> these branches to be statically predicted correctly, in which case it
> appears that no prediction resources are used.

> I think we still need real measurements of the cost

Agreed for the last point.  For C I'm a bit skeptical; for Java I
wouldn't like to even guess.

> , which, at least for
> Java, I would expect to greatly depend on the cleverness of the compiler in
> delaying branches and avoiding unnecessary ones.
>
> Torvald Riegel and Paul McKenney are trying to turn C++11/C11
> memory_order_consume into something useful, and have been running into some
> of the same problems with definition of dependencies as we have here.

There's also a bit of a question right now about "fake" data and
control dependency preservation on ARM; hopefully that will become
clear soon.

> Although at most marginally relevant for Java, we also became aware of an
> ARM erratum
> (http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf,
> perhaps discovered by some of the other participants here?)

(y)

>, that seems to
> effectively reduce the cost of prohibiting load->store reordering on ARMv7
> for C++ memory_order_relaxed to zero.  Apparently a substantial fraction of
> ARMv7 cores have a hardware erratum that requires a fence for
> memory_order_relaxed loads anyway.  Otherwise loads from the same location
> may be reordered, which is disallowed for C++ memory_order_relaxed, but
> allowed for Java.  Thus any object code that is intended to correctly
> support memory_order_relaxed on these processors should already prohibit
> load->store reordering as a side-effect.  For C and C++, I expect that
> realistically applies to all 32-bit ARM code.  Unfortunately, the required
> workaround seems appreciably more expensive than what we would need to just
> enforce load->store ordering, since it needs an actual fence.

I do wonder how widely that workaround is actually deployed - any data?

> As mentioned, this does not directly change the Java situation.  It also
> does not affect 64-bit executables intended to run on ARMv8.

Indeed

best,
Peter

> Hans
>
>
>
> On Thu, Jul 17, 2014 at 1:11 PM, Peter Sewell <Peter.Sewell at cl.cam.ac.uk>
> wrote:
>>
>> On 17 July 2014 21:00, Doug Lea <dl at cs.oswego.edu> wrote:
>> >
>> > Things were moving along rather nicely. And then ... nothing.
>>
>> We have been working in the meantime, but (unsurprisingly) it's proved
>> tricky...  we have a draft paper at
>>
>> http://www.cl.cam.ac.uk/~pes20/cpp/c_concurrency_challenges.pdf
>>
>> that has some good news and bad news, mostly in the C context (work by
>> Mark Batty, Kayvan Memarian, Kyndylan Nienhuis, Jean Pichon, and
>> myself).  The former consist of a machine-checked proof of DRF-SC for
>> the C/C++11 model and a machine-checked-equivalent operational C/C++11
>> model.  For the latter, we describe our observations that there's no
>> fix to the thin-air problems (preserving existing optimisations) in a
>> per-candidate-execution style, and that this is not just a problem for
>> relaxed atomics - the mix of nonatomic and (eg) SC atomic accesses
>> that is allowed in C also leads to thin-air difficulties (this latter
>> isn't necessarily relevant to Java, of course). Then we explore
>> further the operational construction we sent a mail around earlier -
>> it's interesting in that it copes with the thin-air examples (and in
>> general with reordering) very nicely, but difficulties with other
>> optimisations suggest it's not really a solution for C.   I guess also
>> not for Java, but I don't have a good intuition for how much
>> optimisation you might be prepared to give up.
>>
>> Meanwhile (we've just been talking at a couple of meetings), Viktor
>> and Francesco et al. have identified yet other problems with C/C++11,
>> especially related to whether source-to-source (or IR-to-IR)
>> optimisations are sound w.r.t. it. Some of these are fixable, but
>> others involve the same basic thin-air problem.
>>
>> Right now, I'm not aware of any fleshed out credible proposal for a
>> decent model that allows something like relaxed atomics (implementable
>> with just plain accesses) and current optimisations...    perhaps
>> Alan's event-structure semantics will give us a way forward.
>>
>> > My sense is that people following things closely suddenly became less
>> > optimistic that we will arrive at something simple and beautiful and
>> > readily understandable after seeing Peter Sewell's proposed amendments
>> > to C++/C11 (http://www.cl.cam.ac.uk/~pes20/cpp/notes44.html) and Alan
>> > Jeffrey's unsimple follow-ups to his simple fresh start. See March
>> > list archives at
>> > http://mail.openjdk.java.net/pipermail/jmm-dev/2014-March/thread.html
>> >
>> > This seems to happen all the time with memory models. Practical
>> > necessities surrounding processors and optimizers lead to messiness.
>> > In particular, most people wanted to get rid of JSR133 user-hostile
>> > "justification sequences" and the like, but this is now far from a
>> > sure thing. (Peter's approach amounts to a special form of them.)
>>
>> Not really the same kind of thing, I think - that operational
>> construction involves more than just a single candidate execution, but
>> not the bait-and-switch nature of JSR133.
>>
>> best,
>> Peter
>>
>>
>> > All ideas would be welcome on how we can recover forward progress on
>> > the core model. Especially since we do have some other updates
>> > more-or-less conceptually ready to adapt to them, including actual
>> > specs for "enhanced-volatile" acquire/release and other intrinsics,
>> > and replacing the "final fields" specs with simpler constructor
>> > release guarantees.
>> >
>> > -Doug
>
>