[concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

Tue Dec 9 20:34:25 UTC 2014

Is the thorn the many allowed outcomes, or the single disallowed 
outcome? (eg order consistency is too strict for stores with no 
synchronizes-with between them?)

Alex

On 26/11/2014 02:10, David Holmes wrote:
> Hi Hans,
> Given IRIW is a thorn in everyone's side and has no known useful 
> benefit, and can hopefully be killed off in the future, lets not get 
> bogged down in IRIW. But none of what you say below relates to 
> multi-copy-atomicity.
> Cheers,
> David
>
>     -----Original Message-----
>     *From:* hjkhboehm at gmail.com [mailto:hjkhboehm at gmail.com]*On Behalf
>     Of *Hans Boehm
>     *Sent:* Wednesday, 26 November 2014 12:04 PM
>     *To:* dholmes at ieee.org
>     *Cc:* Stephan Diestelhorst; concurrency-interest at cs.oswego.edu;
>     core-libs-dev
>     *Subject:* Re: [concurrency-interest] RFR: 8065804:
>     JEP171:Clarifications/corrections for fence intrinsics
>
>     To be concrete here, on Power, loads can normally be ordered by an
>     address dependency or light-weight fence (lwsync).  However,
>     neither is enough to prevent the questionable outcome for IRIW,
>     since it doesn't ensure that the stores in T1 and T2 will be made
>     visible to other threads in a consistent order.  That outcome can
>     be prevented by using heavyweight fences (sync) instructions
>     between the loads instead.  Peter Sewell's group concluded that to
>     enforce correct volatile behavior on Power, you essentially need a
>     a heavyweight fence between every pair of volatile operations on
>     Power.  That cannot be understood based on simple ordering
>     constraints.
>
>     As Stephan pointed out, there are similar issues on ARM, but
>     they're less commonly encountered in a Java implementation.  If
>     you're lucky, you can get to the right implementation recipe by
>     looking at only reordering, I think.
>
>
>     On Tue, Nov 25, 2014 at 4:36 PM, David Holmes
>     <davidcholmes at aapt.net.au <mailto:davidcholmes at aapt.net.au>> wrote:
>
>         Stephan Diestelhorst writes:
>         >
>         > David Holmes wrote:
>         > > Stephan Diestelhorst writes:
>         > > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
>         > > > > I'm no hardware architect, but fundamentally it seems
>         to me that
>         > > > >
>         > > > > load x
>         > > > > acquire_fence
>         > > > >
>         > > > > imposes a much more stringent constraint than
>         > > > >
>         > > > > load_acquire x
>         > > > >
>         > > > > Consider the case in which the load from x is an L1
>         hit, but a
>         > > > > preceding load (from say y) is a long-latency miss. 
>         If we enforce
>         > > > > ordering by just waiting for completion of prior
>         operation, the
>         > > > > former has to wait for the load from y to complete;
>         while the
>         > > > > latter doesn't.  I find it hard to believe that this
>         doesn't leave
>         > > > > an appreciable amount of performance on the table, at
>         least for
>         > > > > some interesting microarchitectures.
>         > > >
>         > > > I agree, Hans, that this is a reasonable assumption. 
>         Load_acquire x
>         > > > does allow roach motel, whereas the acquire fence does not.
>         > > >
>         > > > >  In addition, for better or worse, fencing
>         requirements on at least
>         > > > >  Power are actually driven as much by store atomicity
>         issues, as by
>         > > > >  the ordering issues discussed in the cookbook.  This
>         was not
>         > > > >  understood in 2005, and unfortunately doesn't seem to be
>         > amenable to
>         > > > >  the kind of straightforward explanation as in Doug's
>         cookbook.
>         > > >
>         > > > Coming from a strongly ordered architecture to a weakly
>         ordered one
>         > > > myself, I also needed some mental adjustment about store
>         (multi-copy)
>         > > > atomicity.  I can imagine others will be unaware of this
>         difference,
>         > > > too, even in 2014.
>         > >
>         > > Sorry I'm missing the connection between fences and multi-copy
>         > atomicity.
>         >
>         > One example is the classic IRIW.  With non-multi copy atomic
>         stores, but
>         > ordered (say through a dependency) loads in the following
>         example:
>         >
>         > Memory: foo = bar = 0
>         > _T1_         _T2_         _T3_           _T4_
>         > st (foo),1   st (bar),1   ld r1, (bar)           ld r3,(foo)
>         >                           <addr dep / local "fence" here> 
>          <addr dep>
>         >                           ld r2, (foo)           ld r4, (bar)
>         >
>         > You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on
>         non-multi-copy atomic
>         > machines.  On TSO boxes, this is not possible. That means
>         that the
>         > memory fence that will prevent such a behaviour (DMB on ARM)
>         needs to
>         > carry some additional oomph in ensuring multi-copy
>         atomicity, or rather
>         > prevent you from seeing it (which is the same thing).
>
>         I take it as given that any code for which you may have ordering
>         constraints, must first have basic atomicity properties for
>         loads and
>         stores. I would not expect any kind of fence to add
>         multi-copy-atomicity
>         where there was none.
>
>         David
>
>         > Stephan
>         >
>         > _______________________________________________
>         > Concurrency-interest mailing list
>         > Concurrency-interest at cs.oswego.edu
>         <mailto:Concurrency-interest at cs.oswego.edu>
>         > http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>         _______________________________________________
>         Concurrency-interest mailing list
>         Concurrency-interest at cs.oswego.edu
>         <mailto:Concurrency-interest at cs.oswego.edu>
>         http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest