[concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics
Oleksandr Otenko
oleksandr.otenko at oracle.com
Tue Dec 9 20:34:25 UTC 2014
Is the thorn the many allowed outcomes, or the single disallowed
outcome? (eg order consistency is too strict for stores with no
synchronizes-with between them?)
Alex
On 26/11/2014 02:10, David Holmes wrote:
> Hi Hans,
> Given IRIW is a thorn in everyone's side and has no known useful
> benefit, and can hopefully be killed off in the future, lets not get
> bogged down in IRIW. But none of what you say below relates to
> multi-copy-atomicity.
> Cheers,
> David
>
> -----Original Message-----
> *From:* hjkhboehm at gmail.com [mailto:hjkhboehm at gmail.com]*On Behalf
> Of *Hans Boehm
> *Sent:* Wednesday, 26 November 2014 12:04 PM
> *To:* dholmes at ieee.org
> *Cc:* Stephan Diestelhorst; concurrency-interest at cs.oswego.edu;
> core-libs-dev
> *Subject:* Re: [concurrency-interest] RFR: 8065804:
> JEP171:Clarifications/corrections for fence intrinsics
>
> To be concrete here, on Power, loads can normally be ordered by an
> address dependency or light-weight fence (lwsync). However,
> neither is enough to prevent the questionable outcome for IRIW,
> since it doesn't ensure that the stores in T1 and T2 will be made
> visible to other threads in a consistent order. That outcome can
> be prevented by using heavyweight fences (sync) instructions
> between the loads instead. Peter Sewell's group concluded that to
> enforce correct volatile behavior on Power, you essentially need a
> a heavyweight fence between every pair of volatile operations on
> Power. That cannot be understood based on simple ordering
> constraints.
>
> As Stephan pointed out, there are similar issues on ARM, but
> they're less commonly encountered in a Java implementation. If
> you're lucky, you can get to the right implementation recipe by
> looking at only reordering, I think.
>
>
> On Tue, Nov 25, 2014 at 4:36 PM, David Holmes
> <davidcholmes at aapt.net.au <mailto:davidcholmes at aapt.net.au>> wrote:
>
> Stephan Diestelhorst writes:
> >
> > David Holmes wrote:
> > > Stephan Diestelhorst writes:
> > > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > > > > I'm no hardware architect, but fundamentally it seems
> to me that
> > > > >
> > > > > load x
> > > > > acquire_fence
> > > > >
> > > > > imposes a much more stringent constraint than
> > > > >
> > > > > load_acquire x
> > > > >
> > > > > Consider the case in which the load from x is an L1
> hit, but a
> > > > > preceding load (from say y) is a long-latency miss.
> If we enforce
> > > > > ordering by just waiting for completion of prior
> operation, the
> > > > > former has to wait for the load from y to complete;
> while the
> > > > > latter doesn't. I find it hard to believe that this
> doesn't leave
> > > > > an appreciable amount of performance on the table, at
> least for
> > > > > some interesting microarchitectures.
> > > >
> > > > I agree, Hans, that this is a reasonable assumption.
> Load_acquire x
> > > > does allow roach motel, whereas the acquire fence does not.
> > > >
> > > > > In addition, for better or worse, fencing
> requirements on at least
> > > > > Power are actually driven as much by store atomicity
> issues, as by
> > > > > the ordering issues discussed in the cookbook. This
> was not
> > > > > understood in 2005, and unfortunately doesn't seem to be
> > amenable to
> > > > > the kind of straightforward explanation as in Doug's
> cookbook.
> > > >
> > > > Coming from a strongly ordered architecture to a weakly
> ordered one
> > > > myself, I also needed some mental adjustment about store
> (multi-copy)
> > > > atomicity. I can imagine others will be unaware of this
> difference,
> > > > too, even in 2014.
> > >
> > > Sorry I'm missing the connection between fences and multi-copy
> > atomicity.
> >
> > One example is the classic IRIW. With non-multi copy atomic
> stores, but
> > ordered (say through a dependency) loads in the following
> example:
> >
> > Memory: foo = bar = 0
> > _T1_ _T2_ _T3_ _T4_
> > st (foo),1 st (bar),1 ld r1, (bar) ld r3,(foo)
> > <addr dep / local "fence" here>
> <addr dep>
> > ld r2, (foo) ld r4, (bar)
> >
> > You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on
> non-multi-copy atomic
> > machines. On TSO boxes, this is not possible. That means
> that the
> > memory fence that will prevent such a behaviour (DMB on ARM)
> needs to
> > carry some additional oomph in ensuring multi-copy
> atomicity, or rather
> > prevent you from seeing it (which is the same thing).
>
> I take it as given that any code for which you may have ordering
> constraints, must first have basic atomicity properties for
> loads and
> stores. I would not expect any kind of fence to add
> multi-copy-atomicity
> where there was none.
>
> David
>
> > Stephan
> >
> > _______________________________________________
> > Concurrency-interest mailing list
> > Concurrency-interest at cs.oswego.edu
> <mailto:Concurrency-interest at cs.oswego.edu>
> > http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> <mailto:Concurrency-interest at cs.oswego.edu>
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
More information about the core-libs-dev
mailing list