[concurrency-interest] RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Hans Boehm boehm at acm.org
Tue Nov 25 19:15:36 UTC 2014


It seems to me that a (dubiuously named) loadFence is intended to have
essentially the same semantics as the (perhaps slightly less dubiously
named) C++ atomic_thread_fence(memory_order_acquire), and a storeFence
matches atomic_thread_fence(memory_order_release).  The C++ standard and,
even more so, Mark Batty's work have a precise definition of what those
mean in terms of implied "synchronizes with" relationships.

It looks to me like this whole implementation model for volatiles in terms
of fences is fundamentally doomed, and it probably makes more sense to get
rid of it rather than spending time on renaming it (though we just did the
latter in Android to avoid similar confusion about semantics).  It's
fundamentally incompatible with the way volatiles/atomics are intended to
be implemented on ARMv8 (and Itanium).  Which I think fundamentally get
this much closer to right than traditional fence-based ISAs.

I'm no hardware architect, but fundamentally it seems to me that

load x
acquire_fence

imposes a much more stringent constraint than

load_acquire x

Consider the case in which the load from x is an L1 hit, but a preceding
load (from say y) is a long-latency miss.  If we enforce ordering by just
waiting for completion of prior operation, the former has to wait for the
load from y to complete; while the latter doesn't.  I find it hard to
believe that this doesn't leave an appreciable amount of performance on the
table, at least for some interesting microarchitectures.

Along similar lines, it seems to me that Doug's JSR cookbook was a great
contribution originally, in that it finally pinned down understandable
rules for implementors, while providing a lot of useful intuition.  At this
point, it is still quite useful for intuition, but
http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html (remembering that
Java volatile = C++ SC atomic) is a much better guide for implementors,
especially on Power.  The SPARC-like fence primitives used in the cookbook
are no longer reflective of the most widely used architectures.  And they
do not reflect the fence types actually needed by Java.  In addition, for
better or worse, fencing requirements on at least Power are actually driven
as much by store atomicity issues, as by the ordering issues discussed in
the cookbook.  This was not understood in 2005, and unfortunately doesn't
seem to be amenable to the kind of straightforward explanation as in Doug's
cookbook.

Hans

On Tue, Nov 25, 2014 at 10:16 AM, DT <dt at flyingtroika.com> wrote:

> I see time to time comments in the jvm sources referencing membars and
> fences. Would you say that they are used interchangeably ? Having the same
> meaning but for different CPU arch.
>
> Sent from my iPhone
>
> > On Nov 25, 2014, at 6:04 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> >
> > Hi Martin,
> >
> > Thanks for looking into this.
> >
> > 1141      * Currently hotspot's implementation of a Java language-level
> volatile
> > 1142      * store has the same effect as a storeFence followed by a
> relaxed store,
> > 1143      * although that may be a little stronger than needed.
> >
> > IIUC to emulate hotpot's volatile store you will need to say that a
> fullFence immediately follows the relaxed store.
> >
> > The bit that always confuses me about release and acquire is ordering is
> restricted to one direction, as talked about in orderAccess.hpp [1]. So for
> a release, accesses prior to the release cannot move below it, but accesses
> succeeding the release can move above it. And that seems to apply to
> Unsafe.storeFence [2] (acting like a monitor exit). Is that contrary to C++
> release fences where ordering is restricted both to prior and succeeding
> accesses? [3]
> >
> > So what about the following?
> >
> >  a = r1; // Cannot move below the fence
> >  Unsafe.storeFence();
> >  b = r2; // Can move above the fence?
> >
> > Paul.
> >
> > [1] In orderAccess.hpp
> > // Execution by a processor of release makes the effect of all memory
> > // accesses issued by it previous to the release visible to all
> > // processors *before* the release completes.  The effect of subsequent
> > // memory accesses issued by it *may* be made visible *before* the
> > // release.  I.e., subsequent memory accesses may float above the
> > // release, but prior ones may not float below it.
> >
> > [2] In memnode.hpp
> > // "Release" - no earlier ref can move after (but later refs can move
> > // up, like a speculative pipelined cache-hitting Load).  Requires
> > // multi-cpu visibility.  Inserted independent of any store, as required
> > // for intrinsic sun.misc.Unsafe.storeFence().
> > class StoreFenceNode: public MemBarNode {
> > public:
> >  StoreFenceNode(Compile* C, int alias_idx, Node* precedent)
> >    : MemBarNode(C, alias_idx, precedent) {}
> >  virtual int Opcode() const;
> > };
> >
> > [3]
> http://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/
> >
> >> On Nov 25, 2014, at 1:47 AM, Martin Buchholz <martinrb at google.com>
> wrote:
> >>
> >> OK, I worked in some wording for comparison with volatiles.
> >> I believe you when you say that the semantics of the corresponding C++
> >> fences are slightly different, but it's rather subtle - can we say
> >> anything more than "closely related to"?
> >>
> >> On Mon, Nov 24, 2014 at 1:29 PM, Aleksey Shipilev
> >> <aleksey.shipilev at oracle.com> wrote:
> >>> Hi Martin,
> >>>
> >>>> On 11/24/2014 11:56 PM, Martin Buchholz wrote:
> >>>> Review carefully - I am trying to learn about fences by explaining
> them!
> >>>> I have borrowed some wording from my reviewers!
> >>>>
> >>>> https://bugs.openjdk.java.net/browse/JDK-8065804
> >>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk9/fence-intrinsics/
> >>>
> >>> I think "implies the effect of C++11" is too strong wording. "related"
> >>> might be more appropriate.
> >>>
> >>> See also comments here for connection with "volatiles":
> >>> https://bugs.openjdk.java.net/browse/JDK-8038978
> >>>
> >>> Take note the Hans' correction that fences generally imply more than
> >>> volatile load/store, but since you are listing the related things in
> the
> >>> docs, I think the "native" Java example is good to have.
> >>>
> >>> -Aleksey.
> >> _______________________________________________
> >> Concurrency-interest mailing list
> >> Concurrency-interest at cs.oswego.edu
> >> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
> >
> > _______________________________________________
> > Concurrency-interest mailing list
> > Concurrency-interest at cs.oswego.edu
> > http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>



More information about the core-libs-dev mailing list