[concurrency-interest] RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics
Stephan Diestelhorst
stephan.diestelhorst at gmail.com
Tue Nov 25 23:24:59 UTC 2014
Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> I'm no hardware architect, but fundamentally it seems to me that
>
> load x
> acquire_fence
>
> imposes a much more stringent constraint than
>
> load_acquire x
>
> Consider the case in which the load from x is an L1 hit, but a preceding
> load (from say y) is a long-latency miss. If we enforce ordering by just
> waiting for completion of prior operation, the former has to wait for the
> load from y to complete; while the latter doesn't. I find it hard to
> believe that this doesn't leave an appreciable amount of performance on the
> table, at least for some interesting microarchitectures.
I agree, Hans, that this is a reasonable assumption. Load_acquire x
does allow roach motel, whereas the acquire fence does not.
> In addition, for better or worse, fencing requirements on at least
> Power are actually driven as much by store atomicity issues, as by
> the ordering issues discussed in the cookbook. This was not
> understood in 2005, and unfortunately doesn't seem to be amenable to
> the kind of straightforward explanation as in Doug's cookbook.
Coming from a strongly ordered architecture to a weakly ordered one
myself, I also needed some mental adjustment about store (multi-copy)
atomicity. I can imagine others will be unaware of this difference,
too, even in 2014.
Stephan
More information about the core-libs-dev
mailing list