[jmm-dev] jdk9 APIs [Fences specifically]

Wed Aug 19 12:01:07 UTC 2015

On Friday 14 August 2015 19:36:55 Hans Boehm wrote:
> As I argued in my earlier message, storeStore
> really only makes sense for ordering prior writes to fields
> that are subsequently treated as read-only.  That's a really
> narrow application domain but probably a disproportional security
> bug magnet (for those people who care about Java security).
>
> If we were to generalize the final field ordering guarantees
> to other fields, I would argue that storeStore is no longer a safe
> implementation of the constructor barrier.  People would
> "naively" expect that if they just wrote a field in a constructor,
> and the object has not yet been published, it should read
> back as the value they just wrote.  This becomes an unsafe
> assumption if storeStore is used.
>
> My impression is that the "st" variant of "dmb" is only a win
> on a smallish number of microarchitectures.  Someone from
> ARM should confirm or deny.
>
> The more I think about it, the less confident I am that the use
> of "dmb ishst" is actually correct as a constructor barrier to
> start with.  Consider:
>
> Thread 1:
> t1.f = 1;   // Final field write in constructor
>        // Implicit constructor StoreStore fence ("dmb ishst") here.
> a = t1; // Correctly publish outside constructor
>
> Thread 2:
> t2.f2 = a;  // Final field write in constructor
>        // Implicit constructor StoreStore fence
> b = t2; // Correctly publish outside constructor
>
> Thread 3:
> if (b != null && b.f2 != null) {
>     t2 = b.f2.f;  // Guaranteed to see 1?
> }
>
> Is thread 3 guaranteed to see an initialized f?  Based on the
> ARM spec, I'm not sure either way.  In any case, this seems
> really hard to specify.  Yet it's likely that most real users will
> (or at least should) care.  Unlike the OOTA issues where we
> kind of all know what we mean, I don't think hand-waving works
> here.

We have looked at the example above and if you look at the definition of
our StoreStore barrier, you will find that it is somewhat involved (and
not just core-local!).  With that definition under the belt, the example
is indeed guaranteed to see the initialised value here.

The question on the performance side with the stronger StoreStore
barrier is one that we cannot answer publicly, yet.

> AFAICT, storeStore is a feature we don't know how to specify,
> we're not sure is really useful in correct code,

We have a specification out there for a strong-ish StoreStore barrier
(at least judged by this example), which effectively not just orders
thread-local stores, but also includes order with stores on other
CPUs that are "logically" after the StoreStore barrier (through an edge
from a load that reads things from after the barrier).  That way, these
logically dependent stores become globally ordered with respect to the
stores before the barrier -- reducing the impact of the absent
multi-copy  atomicity.

> but can potentially be used to improve performance marginally on a few
> microarchitectures for one ISA.

These barriers may indeed be more expensive than the thread-local push
the stores in-order barriers.  However, those are complicated to reason
about without any additional multi-copy atomicity.
--
Sincerely,
  Stephan

Stephan Diestelhorst
ARM Research - Systems

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2548782