[jmm-dev] jdk9 APIs [Fences specifically]

Fri Aug 14 18:36:55 UTC 2015

On Fri, Aug 14, 2015 at 4:50 AM, Doug Lea <dl at cs.oswego.edu> wrote:

> So there are limitations in the ability of ordering control to
> improve responsiveness. Which is unsurprising given all the
> other limitations under weak scheduling guarantees. But
> that's not much of an argument for not even allowing it.
I think it's not a limitation; it's just the wrong mechanism.
The mechanism you want doesn't care about reordering of
memory visibility to other threads.  That's what fences are
about.  It does care about merging of memory operations and
reordering with local compute operations.  Fences are not about that.
As you point out, that's kind of the domain of C-style volatiles,
though they really only address the "combining" part.

>
[Hans:]
>>  But I think this doesn't have anything to do with fences.
>
>
> Ordering constraints seem intrinsic to the problem at hand.
> It's the complement of the main issue in  RCU/consume:
> "really read this" vs  "really write this".
I think the RCU constraint is about ordering.  It's not "really read this".
CSE on two consume loads is OK for correctness, though
probably otherwise evil.  And my understanding is that
you really want to order loads with respect to dependent stores
as well.

...
>
> But even if so, it seems better to have a uniform API:
>   writes:  full > release > storeStore
>   reads:   full > acquire > loadLoad
> even if loadLoad is internally mapped to acquire.
> And omitting it feels even more wrong if we support
> RCU-like usages with scoped loadLoadFence(Object ref).

C++ has a great solution to that: include neither storeStore
nor loadLoad.

As I argued in my earlier message, storeStore
really only makes sense for ordering prior writes to fields
that are subsequently treated as read-only.  That's a really
narrow application domain but probably a disproportional security
bug magnet (for those people who care about Java security).

If we were to generalize the final field ordering guarantees
to other fields, I would argue that storeStore is no longer a safe
implementation of the constructor barrier.  People would
"naively" expect that if they just wrote a field in a constructor,
and the object has not yet been published, it should read
back as the value they just wrote.  This becomes an unsafe
assumption if storeStore is used.

My impression is that the "st" variant of "dmb" is only a win
on a smallish number of microarchitectures.  Someone from
ARM should confirm or deny.

The more I think about it, the less confident I am that the use
of "dmb ishst" is actually correct as a constructor barrier to
start with.  Consider:

Thread 1:
t1.f = 1;   // Final field write in constructor
       // Implicit constructor StoreStore fence ("dmb ishst") here.
a = t1; // Correctly publish outside constructor

Thread 2:
t2.f2 = a;  // Final field write in constructor
       // Implicit constructor StoreStore fence
b = t2; // Correctly publish outside constructor

Thread 3:
if (b != null && b.f2 != null) {
    t2 = b.f2.f;  // Guaranteed to see 1?
}

Is thread 3 guaranteed to see an initialized f?  Based on the
ARM spec, I'm not sure either way.  In any case, this seems
really hard to specify.  Yet it's likely that most real users will
(or at least should) care.  Unlike the OOTA issues where we
kind of all know what we mean, I don't think hand-waving works
here.

AFAICT, storeStore is a feature we don't know how to specify,
we're not sure is really useful in correct code, but can potentially
be used to improve performance marginally on a few
microarchitectures for one ISA.  And it's likely to be a bug magnet.
That just doesn't seem like a very compelling case.

Hans

> -Doug
>