[concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

Tue Dec 9 21:53:32 UTC 2014

Yes, I do understand the reader needs barriers, too. I guess I was 
wondering more why the reader would need something stronger than what 
dependencies etc could enforce. I guess I'll read what Martin forwarded 
first.

Alex

On 09/12/2014 21:37, David Holmes wrote:
> See my earlier response to Martin. The reader has to force a 
> consistent view of memory - the writer can't as the write escapes 
> before it can issue the barrier.
> David
>
>     -----Original Message-----
>     *From:* concurrency-interest-bounces at cs.oswego.edu
>     [mailto:concurrency-interest-bounces at cs.oswego.edu]*On Behalf Of
>     *Oleksandr Otenko
>     *Sent:* Wednesday, 10 December 2014 6:04 AM
>     *To:* Hans Boehm; dholmes at ieee.org
>     *Cc:* core-libs-dev; concurrency-interest at cs.oswego.edu
>     *Subject:* Re: [concurrency-interest] RFR: 8065804:
>     JEP171:Clarifications/corrections for fence intrinsics
>
>     On 26/11/2014 02:04, Hans Boehm wrote:
>>     To be concrete here, on Power, loads can normally be ordered by
>>     an address dependency or light-weight fence (lwsync).  However,
>>     neither is enough to prevent the questionable outcome for IRIW,
>>     since it doesn't ensure that the stores in T1 and T2 will be made
>>     visible to other threads in a consistent order.  That outcome can
>>     be prevented by using heavyweight fences (sync) instructions
>>     between the loads instead.
>
>     Why would they need fences between loads instead of syncing the
>     order of stores?
>
>
>     Alex
>
>
>>     Peter Sewell's group concluded that to enforce correct volatile
>>     behavior on Power, you essentially need a a heavyweight fence
>>     between every pair of volatile operations on Power.  That cannot
>>     be understood based on simple ordering constraints.
>>
>>     As Stephan pointed out, there are similar issues on ARM, but
>>     they're less commonly encountered in a Java implementation.  If
>>     you're lucky, you can get to the right implementation recipe by
>>     looking at only reordering, I think.
>>
>>
>>     On Tue, Nov 25, 2014 at 4:36 PM, David Holmes
>>     <davidcholmes at aapt.net.au <mailto:davidcholmes at aapt.net.au>> wrote:
>>
>>         Stephan Diestelhorst writes:
>>         >
>>         > David Holmes wrote:
>>         > > Stephan Diestelhorst writes:
>>         > > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans
>>         Boehm:
>>         > > > > I'm no hardware architect, but fundamentally it seems
>>         to me that
>>         > > > >
>>         > > > > load x
>>         > > > > acquire_fence
>>         > > > >
>>         > > > > imposes a much more stringent constraint than
>>         > > > >
>>         > > > > load_acquire x
>>         > > > >
>>         > > > > Consider the case in which the load from x is an L1
>>         hit, but a
>>         > > > > preceding load (from say y) is a long-latency miss. 
>>         If we enforce
>>         > > > > ordering by just waiting for completion of prior
>>         operation, the
>>         > > > > former has to wait for the load from y to complete;
>>         while the
>>         > > > > latter doesn't.  I find it hard to believe that this
>>         doesn't leave
>>         > > > > an appreciable amount of performance on the table, at
>>         least for
>>         > > > > some interesting microarchitectures.
>>         > > >
>>         > > > I agree, Hans, that this is a reasonable assumption. 
>>         Load_acquire x
>>         > > > does allow roach motel, whereas the acquire fence does not.
>>         > > >
>>         > > > >  In addition, for better or worse, fencing
>>         requirements on at least
>>         > > > >  Power are actually driven as much by store atomicity
>>         issues, as by
>>         > > > >  the ordering issues discussed in the cookbook.  This
>>         was not
>>         > > > >  understood in 2005, and unfortunately doesn't seem to be
>>         > amenable to
>>         > > > >  the kind of straightforward explanation as in Doug's
>>         cookbook.
>>         > > >
>>         > > > Coming from a strongly ordered architecture to a weakly
>>         ordered one
>>         > > > myself, I also needed some mental adjustment about
>>         store (multi-copy)
>>         > > > atomicity.  I can imagine others will be unaware of
>>         this difference,
>>         > > > too, even in 2014.
>>         > >
>>         > > Sorry I'm missing the connection between fences and
>>         multi-copy
>>         > atomicity.
>>         >
>>         > One example is the classic IRIW.  With non-multi copy
>>         atomic stores, but
>>         > ordered (say through a dependency) loads in the following
>>         example:
>>         >
>>         > Memory: foo = bar = 0
>>         > _T1_         _T2_         _T3_             _T4_
>>         > st (foo),1   st (bar),1   ld r1, (bar)             ld r3,(foo)
>>         >                           <addr dep / local "fence" here> 
>>          <addr dep>
>>         >                           ld r2, (foo)             ld r4, (bar)
>>         >
>>         > You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on
>>         non-multi-copy atomic
>>         > machines.  On TSO boxes, this is not possible. That means
>>         that the
>>         > memory fence that will prevent such a behaviour (DMB on
>>         ARM) needs to
>>         > carry some additional oomph in ensuring multi-copy
>>         atomicity, or rather
>>         > prevent you from seeing it (which is the same thing).
>>
>>         I take it as given that any code for which you may have ordering
>>         constraints, must first have basic atomicity properties for
>>         loads and
>>         stores. I would not expect any kind of fence to add
>>         multi-copy-atomicity
>>         where there was none.
>>
>>         David
>>
>>         > Stephan
>>         >
>>         > _______________________________________________
>>         > Concurrency-interest mailing list
>>         > Concurrency-interest at cs.oswego.edu
>>         <mailto:Concurrency-interest at cs.oswego.edu>
>>         > http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>         _______________________________________________
>>         Concurrency-interest mailing list
>>         Concurrency-interest at cs.oswego.edu
>>         <mailto:Concurrency-interest at cs.oswego.edu>
>>         http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>
>>
>>
>>     _______________________________________________
>>     Concurrency-interest mailing list
>>     Concurrency-interest at cs.oswego.edu
>>     http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>