[jmm-dev] jdk9 APIs [Fences specifically]

Doug Lea dl at cs.oswego.edu
Thu Aug 13 12:19:17 UTC 2015


On 08/12/2015 06:33 PM, Hans Boehm wrote:
>
> Let me argue once more against LoadLoad, and at least dampen the
> enthusiasm for StoreStore.

Thanks for the critiques! (Even though I remain unconvinced.)

I should have noted that ARM mappings are only part of the motivation
for loadLoadFence and storeStoreFence.  Another is protection against
loop "optimizations" that are highly non-optimal.  This is not
strictly a compiler issue, but easier to illustrate as one.  Suppose
for example you have a method that writes several variables, along
with reader methods that can handle all ordering races among the
writes. But you still want to ensure that the variables are actually
written if the method is called in a loop. A trailing
storeStoreFence() seems to be the cheapest and conceptually most
appropriate way to reduce communication latency.  (In other words, it
is "correct" but undesirable for method c() here to only use the
final (x, y) values.)  Symmetrical arguments apply to using
leading loadLoadFences on the complementary reader methods
(that is otherwise similar to RCU-like constructions).

class C {
    int x = 0, y = 0; // relaxed

    void p() {      // called in producer thread
       for (int i = 0; i < 1000000; ++i)
         writes(heavyPureComputation(i));
    }

    void c() {      // called in consumer thread
      for (;;) {
        if (occasionally)
          reads();
        // ...
     }

    void writes(int k) {
       x = k;
       y = k + 17;
       storeStoreFence(); // please actually store x and y if in a loop
    }

    void reads() {
      loadLoadFence();   // please actually load x and y if in a loop
      if (y == x + 17)
        something();
    }
}

This is not a hypothetical example. It's abstracted from cases I've
encountered. Like the RCU-like examples mentioned yesterday, these effects
arise only when you are writing racy performance-critical code.  But
that's what low-level concurrent algorithm and data structure
designers do!

Back to ..

>
> I know of no hardware instructions, except on SPARC, that correspond
> to a LoadLoad fence.  And my impression is that it's not very useful on
> SPARC.  The ARM DMB xLD fence instruction, if I understand correctly,
> is essentially a C++ acquire fence.

But I think that pseudo-fences (load; compare to self; ...) need not be?

>
> However, I think it difficult to specify correctly outside of that specific
> essentially final-field-initialization scenario.

It doesn't seem hard at all to specify in isolation.
The interactions with base ordering rules can be non-obvious though.
(Especially since, in the absence of a revised base model,
those rules might as well say that anything goes.)
So, like any fence method, it should be used when nothing
simpler applies. And surely not in:

>
> x++; // Increment zero initialized field
> storeStoreFence();
> x_init = true;

-Doug



More information about the jmm-dev mailing list