[jmm-dev] jdk9 APIs [Fences specifically]

Thu Aug 13 23:56:16 UTC 2015

On Thu, Aug 13, 2015 at 4:19 PM, Doug Lea <dl at cs.oswego.edu> wrote:
>
> On 08/13/2015 05:04 PM, Hans Boehm wrote:
>
>> I don't think a fence-based approach works.  Deferring all the stores to
the
>> end of the loop fundamentally remains correct, even with the StoreStore
>> fence, since it's consistent with the producer just running very fast
for a
>> while. The constraint you're trying to enforce has nothing to do with
>> ordering.
>
>
> I must be missing something fundamental about C++ specs. Are C++
> compilers allowed to ignore release fences in between writes
> to the same variables? In unrolled form, that's what this would
> amount to here.

I think that's unavoidable.  If I write

for (...) {
    x = something_expensive();
    fence;
}

it's very hard to prevent the implementation from implementing that as

<pause for a while>
<run the above loop instantaneously and atomically>

And that looks exactly like merging all the stores into one.

I think you can't really disallow the latter without disallowing a
time-slicing
uniprocessor scheduler.  They're behaviorally identical.

That doesn't prevent us from providing the compiler with advice to
discourage
that.  But I think this doesn't have anything to do with fences.

>
>>
>> Aside from not working correctly, you end up slowing down ARM code in
ways
>> that are entirely unnecessary, by inserting "dmb ishld" or "dmb ishst"
>> fences everywhere.  (How expensive they are varies.  On a number of
>> implementations they basically seem to be full fences.)
>
>
> Right. It does put the programmer in control though; for example
>   if ((i % 100) == 99) storeStoreFence()

Agreed.  But it seems to me that this is control over an unnecessary
trade-off.  Clearly the ideal code involves no fences.  And in most cases,
just having the programmer specifying where the stores to shared variables
should go, and having the compiler leave that alone, seems like a better
and simpler way to control this.

Hans

>
> Considering that the goal is communication latency reduction at
> the expense of throughput, only the programmer would be able
> to make these tradeoffs.
>
> -Doug
>
>