[jmm-dev] Jmm revision status

Fri Jul 18 06:19:28 UTC 2014

> On Thu, Jul 17, 2014 at 10:43 PM, Peter Sewell <Peter.Sewell at cl.cam.ac.uk>
> wrote:
> >
> > On 18 July 2014 00:57, Hans Boehm <boehm at acm.org> wrote:
> > > A few other updates:  Brian and I had a paper in MSPC 14
> > > (http://dl.acm.org/citation.cfm?id=2618134) that mostly summarizes the
> > > out-of-thin-air issues and solutions based on prohibiting store->load
> > > reordering.  I would argue that those are still the most practical
> solutions
> > > we currently have.
> > >
> > > One of my colleagues at Google points out that my earlier fear that
> bogus
> > > branches needed to enforce load->store ordering would tie up branch
> > > prediction resources should be unfounded.  It should be easy to
> arrange for
> > > these branches to be statically predicted correctly, in which case it
> > > appears that no prediction resources are used.
> >
> > > I think we still need real measurements of the cost
> >
> > Agreed for the last point.  For C I'm a bit skeptical; for Java I
> > wouldn't like to even guess.
>
For C, if we look only at existing implementations, it seems that the only
cost is prohibiting some compiler transformations on relaxed operations,
and the cost on 64-bit ARMv8.  The former seems trivial; I suspect most
compilers don't reorder atomic accesses anyway.

For Java, I agree.

>
> > > , which, at least for
> > > Java, I would expect to greatly depend on the cleverness of the
> compiler in
> > > delaying branches and avoiding unnecessary ones.
> > >
> > > Torvald Riegel and Paul McKenney are trying to turn C++11/C11
> > > memory_order_consume into something useful, and have been running into
> some
> > > of the same problems with definition of dependencies as we have here.
> >
> > There's also a bit of a question right now about "fake" data and
> > control dependency preservation on ARM; hopefully that will become
> > clear soon.
>
> > Although at most marginally relevant for Java, we also became aware of an
> > ARM erratum
> > (
> http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf
> ,
> > perhaps discovered by some of the other participants here?)
>
> (y)
>
> >, that seems to
> > effectively reduce the cost of prohibiting load->store reordering on
> ARMv7
> > for C++ memory_order_relaxed to zero.  Apparently a substantial fraction
> of
> > ARMv7 cores have a hardware erratum that requires a fence for
> > memory_order_relaxed loads anyway.  Otherwise loads from the same
> location
> > may be reordered, which is disallowed for C++ memory_order_relaxed, but
> > allowed for Java.  Thus any object code that is intended to correctly
> > support memory_order_relaxed on these processors should already prohibit
> > load->store reordering as a side-effect.  For C and C++, I expect that
> > realistically applies to all 32-bit ARM code.  Unfortunately, the
> required
> > workaround seems appreciably more expensive than what we would need to
> just
> > enforce load->store ordering, since it needs an actual fence.
>
> I do wonder how widely that workaround is actually deployed - any data?
>

I suspect it's not.  But I think our task is to look at performance in a
currently hypothetical world where implementations are actually correct in
this respect, and where we no longer see random memory-model induced
failures and attribute them to alpha particles, or whatever.  I think we're
gradually moving towards that hypothetical world, but we're not that close,
yet.  (I would be surprised if there were any real large systems for which
this ARM bug is the most common cause of memory-model-related failures.)

Hans

> > As mentioned, this does not directly change the Java situation.  It also
> > does not affect 64-bit executables intended to run on ARMv8.
>
> Indeed
> best,
> Peter
>
>
> > Hans
> >
>
>