[concurrency-interest] RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Tue Nov 25 23:24:59 UTC 2014

Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> I'm no hardware architect, but fundamentally it seems to me that
> 
> load x
> acquire_fence
> 
> imposes a much more stringent constraint than
> 
> load_acquire x
> 
> Consider the case in which the load from x is an L1 hit, but a preceding
> load (from say y) is a long-latency miss.  If we enforce ordering by just
> waiting for completion of prior operation, the former has to wait for the
> load from y to complete; while the latter doesn't.  I find it hard to
> believe that this doesn't leave an appreciable amount of performance on the
> table, at least for some interesting microarchitectures.

I agree, Hans, that this is a reasonable assumption.  Load_acquire x
does allow roach motel, whereas the acquire fence does not.

>  In addition, for better or worse, fencing requirements on at least
>  Power are actually driven as much by store atomicity issues, as by
>  the ordering issues discussed in the cookbook.  This was not
>  understood in 2005, and unfortunately doesn't seem to be amenable to
>  the kind of straightforward explanation as in Doug's cookbook.

Coming from a strongly ordered architecture to a weakly ordered one
myself, I also needed some mental adjustment about store (multi-copy)
atomicity.  I can imagine others will be unaware of this difference,
too, even in 2014.

Stephan