[concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics
Oleksandr Otenko
oleksandr.otenko at oracle.com
Tue Dec 9 21:53:32 UTC 2014
Yes, I do understand the reader needs barriers, too. I guess I was
wondering more why the reader would need something stronger than what
dependencies etc could enforce. I guess I'll read what Martin forwarded
first.
Alex
On 09/12/2014 21:37, David Holmes wrote:
> See my earlier response to Martin. The reader has to force a
> consistent view of memory - the writer can't as the write escapes
> before it can issue the barrier.
> David
>
> -----Original Message-----
> *From:* concurrency-interest-bounces at cs.oswego.edu
> [mailto:concurrency-interest-bounces at cs.oswego.edu]*On Behalf Of
> *Oleksandr Otenko
> *Sent:* Wednesday, 10 December 2014 6:04 AM
> *To:* Hans Boehm; dholmes at ieee.org
> *Cc:* core-libs-dev; concurrency-interest at cs.oswego.edu
> *Subject:* Re: [concurrency-interest] RFR: 8065804:
> JEP171:Clarifications/corrections for fence intrinsics
>
> On 26/11/2014 02:04, Hans Boehm wrote:
>> To be concrete here, on Power, loads can normally be ordered by
>> an address dependency or light-weight fence (lwsync). However,
>> neither is enough to prevent the questionable outcome for IRIW,
>> since it doesn't ensure that the stores in T1 and T2 will be made
>> visible to other threads in a consistent order. That outcome can
>> be prevented by using heavyweight fences (sync) instructions
>> between the loads instead.
>
> Why would they need fences between loads instead of syncing the
> order of stores?
>
>
> Alex
>
>
>> Peter Sewell's group concluded that to enforce correct volatile
>> behavior on Power, you essentially need a a heavyweight fence
>> between every pair of volatile operations on Power. That cannot
>> be understood based on simple ordering constraints.
>>
>> As Stephan pointed out, there are similar issues on ARM, but
>> they're less commonly encountered in a Java implementation. If
>> you're lucky, you can get to the right implementation recipe by
>> looking at only reordering, I think.
>>
>>
>> On Tue, Nov 25, 2014 at 4:36 PM, David Holmes
>> <davidcholmes at aapt.net.au <mailto:davidcholmes at aapt.net.au>> wrote:
>>
>> Stephan Diestelhorst writes:
>> >
>> > David Holmes wrote:
>> > > Stephan Diestelhorst writes:
>> > > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans
>> Boehm:
>> > > > > I'm no hardware architect, but fundamentally it seems
>> to me that
>> > > > >
>> > > > > load x
>> > > > > acquire_fence
>> > > > >
>> > > > > imposes a much more stringent constraint than
>> > > > >
>> > > > > load_acquire x
>> > > > >
>> > > > > Consider the case in which the load from x is an L1
>> hit, but a
>> > > > > preceding load (from say y) is a long-latency miss.
>> If we enforce
>> > > > > ordering by just waiting for completion of prior
>> operation, the
>> > > > > former has to wait for the load from y to complete;
>> while the
>> > > > > latter doesn't. I find it hard to believe that this
>> doesn't leave
>> > > > > an appreciable amount of performance on the table, at
>> least for
>> > > > > some interesting microarchitectures.
>> > > >
>> > > > I agree, Hans, that this is a reasonable assumption.
>> Load_acquire x
>> > > > does allow roach motel, whereas the acquire fence does not.
>> > > >
>> > > > > In addition, for better or worse, fencing
>> requirements on at least
>> > > > > Power are actually driven as much by store atomicity
>> issues, as by
>> > > > > the ordering issues discussed in the cookbook. This
>> was not
>> > > > > understood in 2005, and unfortunately doesn't seem to be
>> > amenable to
>> > > > > the kind of straightforward explanation as in Doug's
>> cookbook.
>> > > >
>> > > > Coming from a strongly ordered architecture to a weakly
>> ordered one
>> > > > myself, I also needed some mental adjustment about
>> store (multi-copy)
>> > > > atomicity. I can imagine others will be unaware of
>> this difference,
>> > > > too, even in 2014.
>> > >
>> > > Sorry I'm missing the connection between fences and
>> multi-copy
>> > atomicity.
>> >
>> > One example is the classic IRIW. With non-multi copy
>> atomic stores, but
>> > ordered (say through a dependency) loads in the following
>> example:
>> >
>> > Memory: foo = bar = 0
>> > _T1_ _T2_ _T3_ _T4_
>> > st (foo),1 st (bar),1 ld r1, (bar) ld r3,(foo)
>> > <addr dep / local "fence" here>
>> <addr dep>
>> > ld r2, (foo) ld r4, (bar)
>> >
>> > You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on
>> non-multi-copy atomic
>> > machines. On TSO boxes, this is not possible. That means
>> that the
>> > memory fence that will prevent such a behaviour (DMB on
>> ARM) needs to
>> > carry some additional oomph in ensuring multi-copy
>> atomicity, or rather
>> > prevent you from seeing it (which is the same thing).
>>
>> I take it as given that any code for which you may have ordering
>> constraints, must first have basic atomicity properties for
>> loads and
>> stores. I would not expect any kind of fence to add
>> multi-copy-atomicity
>> where there was none.
>>
>> David
>>
>> > Stephan
>> >
>> > _______________________________________________
>> > Concurrency-interest mailing list
>> > Concurrency-interest at cs.oswego.edu
>> <mailto:Concurrency-interest at cs.oswego.edu>
>> > http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> <mailto:Concurrency-interest at cs.oswego.edu>
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>
>>
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
More information about the core-libs-dev
mailing list